Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop, mail # user - incremental updates mysql to HBase


Copy link to this message
-
Re: incremental updates mysql to HBase
Joanne Chan 2013-08-07, 16:15
1. No
2. Then, you are out of luck.

Sqoop isn't magic, imagine how would sqoop know what data was updated
without any indicators?
On Wed, Aug 7, 2013 at 12:12 PM, shengjie min <[EMAIL PROTECTED]> wrote:

>
>
> - Add an update_time column to the source table and do the incremental
> load by that update_time column.
>
> two questions to that then:
>
> 1. will sqoop job updates that field "update_time" automatically when it
> runs OR my application needs to write that field?
> 2. What if I don't wanna or I can't  touch the schema on the source table?
>
> THanks,
> Shengjie
>
> On 8 Aug 2013, at 00:06, Joanne Chan <[EMAIL PROTECTED]> wrote:
>
> Add an update_time column to the source table and do the incremental load
> by that update_time column.
>
>
> On Wed, Aug 7, 2013 at 12:04 PM, shengjie min <[EMAIL PROTECTED]>wrote:
>
>> Hi guys,
>>
>> TO simplify my question, Let's say, I have a mysql table called
>> 'student', looks like this:
>>
>> +----+----------+-----+
>> | id | name     | sex |
>> +----+----------+-----+
>> |  1 | Alice       |   0  |
>> |  2 | Bob         |   1  |
>> |  3 | Charles  |   1  |
>> +----+----------+-----+
>>
>> I want to import this table to HBase periodically which means I will run
>> this sqoop job periodically. There are two goals:
>>
>> A.  every time there is a new record inserted to mysql table, e.g. (4,
>> David, 1), I hope my next sqoop import will catch it and put it in HBase.
>> B. if  there is any updates have been made to mysql rows 1, 2, 3, I want
>> to have the updates in HBase too after next round sqoop import.
>>
>> I checked two types incremental updates sqoop has:  Append mode seems
>> only satisfied goal A while Last-modified mode will require my mysql table
>> has a timestamp column for each row(which I don't in real life). I know if
>> I don't use incremental updates options at all, I can just get way with it
>> by running a fresh import every time, but if my mysql table is really huge
>> and fresh import might be a performance killer.
>>
>> Is there anyway I can just do incremental updates instead of having to
>> re-run the whole import to get NEW RECORDS + UPDATES ON OLD ROWS?
>>
>>
>> Shengjie
>
>
>
>
> --
> -- JChan
>
>
>
--
-- JChan