Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop, mail # user - Data Loss with Sqoop Incremental Import


Copy link to this message
-
Re: Data Loss with Sqoop Incremental Import
Abraham Elmahrek 2014-01-13, 17:43
Oh, it looks like you have a separate thread going for this... will post
there.
On Mon, Jan 13, 2014 at 9:40 AM, Abraham Elmahrek <[EMAIL PROTECTED]> wrote:

> Yogesh,
>
> Is unique_value in this case SAL? I'm a bit confused about your query.
>
> Do you have the option of running this query on a separate database
> somewhere to find the issue? I think it would be interesting to see the
> initial state and then the state after running an incremental import. That
> would tell us how many results are being imported after sqoop has ran and
> we can validate each step. Also, please use the --verbose flag to get the
> most out of the logs.
>
> -Abe
>
>
> On Mon, Jan 13, 2014 at 4:38 AM, yogesh kumar <[EMAIL PROTECTED]>wrote:
>
>> Hello All,
>>
>> I am trying to do incremental import on daily basis and after importing I
>> am finding huge data loss.
>>
>> I have used this script for incremental import from RDBMS to HDFS
>>
>> sqoop import -libjars
>>  --driver com.sybase.jdbc3.jdbc.SybDriver \
>>  --query "select * from
>>  from EMP where \$CONDITIONS and SAL > 201401200 and SAL <= 201401204 \
>> --check-column Unique_value \
>>  --incremental append \
>>  --last-value 201401200 \
>>  --split-by DEPT \
>>  --fields-terminated-by ',' \
>>  --target-dir ${TARGET_DIR}/${INC} \
>>  --username ${SYBASE_USERNAME} \
>>  --password ${SYBASE_PASSWORD} \
>>
>>
>> now I have imported newly inserted data into RDBMS to HDFS
>>
>> but when I do
>>
>> *select count(*) , unique_value from EMP group by unique_value (both in
>> RDBMS and in HIVE)*
>>
>> I can find huge data loss.
>>
>> 1) in RDBMS
>>
>>   Count(*)    Unique_value
>>   1000          201401201
>>    5000         201401202
>>   10000         201401203
>>
>>
>> 2) in HIVE
>>
>>   Count(*)    Unique_value
>>   189          201401201
>>    421         201401202
>>    50           201401203
>>
>>
>> If I do
>>
>> select Unique value from emp ;
>>
>> Result :
>> 201401201
>> 201401201
>> 201401201
>> 201401201
>> 201401201
>> .
>> .
>> 201401202
>> .
>> .
>> and so on...
>>
>>
>> *Pls help and suggest why is it so *
>>
>>
>> *Many thanks in advance*
>>
>> *Yogesh kumar*
>>
>>
>>
>>
>
>