Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> incremental updates mysql to HBase


Copy link to this message
-
Re: incremental updates mysql to HBase
Add an update_time column to the source table and do the incremental load
by that update_time column.
On Wed, Aug 7, 2013 at 12:04 PM, shengjie min <[EMAIL PROTECTED]> wrote:

> Hi guys,
>
> TO simplify my question, Let's say, I have a mysql table called 'student',
> looks like this:
>
> +----+----------+-----+
> | id | name     | sex |
> +----+----------+-----+
> |  1 | Alice       |   0  |
> |  2 | Bob         |   1  |
> |  3 | Charles  |   1  |
> +----+----------+-----+
>
> I want to import this table to HBase periodically which means I will run
> this sqoop job periodically. There are two goals:
>
> A.  every time there is a new record inserted to mysql table, e.g. (4,
> David, 1), I hope my next sqoop import will catch it and put it in HBase.
> B. if  there is any updates have been made to mysql rows 1, 2, 3, I want
> to have the updates in HBase too after next round sqoop import.
>
> I checked two types incremental updates sqoop has:  Append mode seems only
> satisfied goal A while Last-modified mode will require my mysql table has a
> timestamp column for each row(which I don't in real life). I know if I
> don't use incremental updates options at all, I can just get way with it by
> running a fresh import every time, but if my mysql table is really huge and
> fresh import might be a performance killer.
>
> Is there anyway I can just do incremental updates instead of having to
> re-run the whole import to get NEW RECORDS + UPDATES ON OLD ROWS?
>
>
> Shengjie
--
-- JChan