Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> incremental updates mysql to HBase


Copy link to this message
-
Re: incremental updates mysql to HBase
1. No
2. Then, you are out of luck.

Sqoop isn't magic, imagine how would sqoop know what data was updated
without any indicators?
On Wed, Aug 7, 2013 at 12:12 PM, shengjie min <[EMAIL PROTECTED]> wrote:

>
>
> - Add an update_time column to the source table and do the incremental
> load by that update_time column.
>
> two questions to that then:
>
> 1. will sqoop job updates that field "update_time" automatically when it
> runs OR my application needs to write that field?
> 2. What if I don't wanna or I can't  touch the schema on the source table?
>
> THanks,
> Shengjie
>
> On 8 Aug 2013, at 00:06, Joanne Chan <[EMAIL PROTECTED]> wrote:
>
> Add an update_time column to the source table and do the incremental load
> by that update_time column.
>
>
> On Wed, Aug 7, 2013 at 12:04 PM, shengjie min <[EMAIL PROTECTED]>wrote:
>
>> Hi guys,
>>
>> TO simplify my question, Let's say, I have a mysql table called
>> 'student', looks like this:
>>
>> +----+----------+-----+
>> | id | name     | sex |
>> +----+----------+-----+
>> |  1 | Alice       |   0  |
>> |  2 | Bob         |   1  |
>> |  3 | Charles  |   1  |
>> +----+----------+-----+
>>
>> I want to import this table to HBase periodically which means I will run
>> this sqoop job periodically. There are two goals:
>>
>> A.  every time there is a new record inserted to mysql table, e.g. (4,
>> David, 1), I hope my next sqoop import will catch it and put it in HBase.
>> B. if  there is any updates have been made to mysql rows 1, 2, 3, I want
>> to have the updates in HBase too after next round sqoop import.
>>
>> I checked two types incremental updates sqoop has:  Append mode seems
>> only satisfied goal A while Last-modified mode will require my mysql table
>> has a timestamp column for each row(which I don't in real life). I know if
>> I don't use incremental updates options at all, I can just get way with it
>> by running a fresh import every time, but if my mysql table is really huge
>> and fresh import might be a performance killer.
>>
>> Is there anyway I can just do incremental updates instead of having to
>> re-run the whole import to get NEW RECORDS + UPDATES ON OLD ROWS?
>>
>>
>> Shengjie
>
>
>
>
> --
> -- JChan
>
>
>
--
-- JChan