Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Data Loss with Sqoop Incremental Import


Copy link to this message
-
Re: Data Loss with Sqoop Incremental Import
Oh, it looks like you have a separate thread going for this... will post
there.
On Mon, Jan 13, 2014 at 9:40 AM, Abraham Elmahrek <[EMAIL PROTECTED]> wrote:

> Yogesh,
>
> Is unique_value in this case SAL? I'm a bit confused about your query.
>
> Do you have the option of running this query on a separate database
> somewhere to find the issue? I think it would be interesting to see the
> initial state and then the state after running an incremental import. That
> would tell us how many results are being imported after sqoop has ran and
> we can validate each step. Also, please use the --verbose flag to get the
> most out of the logs.
>
> -Abe
>
>
> On Mon, Jan 13, 2014 at 4:38 AM, yogesh kumar <[EMAIL PROTECTED]>wrote:
>
>> Hello All,
>>
>> I am trying to do incremental import on daily basis and after importing I
>> am finding huge data loss.
>>
>> I have used this script for incremental import from RDBMS to HDFS
>>
>> sqoop import -libjars
>>  --driver com.sybase.jdbc3.jdbc.SybDriver \
>>  --query "select * from
>>  from EMP where \$CONDITIONS and SAL > 201401200 and SAL <= 201401204 \
>> --check-column Unique_value \
>>  --incremental append \
>>  --last-value 201401200 \
>>  --split-by DEPT \
>>  --fields-terminated-by ',' \
>>  --target-dir ${TARGET_DIR}/${INC} \
>>  --username ${SYBASE_USERNAME} \
>>  --password ${SYBASE_PASSWORD} \
>>
>>
>> now I have imported newly inserted data into RDBMS to HDFS
>>
>> but when I do
>>
>> *select count(*) , unique_value from EMP group by unique_value (both in
>> RDBMS and in HIVE)*
>>
>> I can find huge data loss.
>>
>> 1) in RDBMS
>>
>>   Count(*)    Unique_value
>>   1000          201401201
>>    5000         201401202
>>   10000         201401203
>>
>>
>> 2) in HIVE
>>
>>   Count(*)    Unique_value
>>   189          201401201
>>    421         201401202
>>    50           201401203
>>
>>
>> If I do
>>
>> select Unique value from emp ;
>>
>> Result :
>> 201401201
>> 201401201
>> 201401201
>> 201401201
>> 201401201
>> .
>> .
>> 201401202
>> .
>> .
>> and so on...
>>
>>
>> *Pls help and suggest why is it so *
>>
>>
>> *Many thanks in advance*
>>
>> *Yogesh kumar*
>>
>>
>>
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB