Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Data Loss with Sqoop Incremental Import


Copy link to this message
-
Data Loss with Sqoop Incremental Import
Hello All,

I am trying to do incremental import on daily basis and after importing I
am finding huge data loss.

I have used this script for incremental import from RDBMS to HDFS

sqoop import -libjars
 --driver com.sybase.jdbc3.jdbc.SybDriver \
 --query "select * from
 from EMP where \$CONDITIONS and SAL > 201401200 and SAL <= 201401204 \
--check-column Unique_value \
 --incremental append \
 --last-value 201401200 \
 --split-by DEPT \
 --fields-terminated-by ',' \
 --target-dir ${TARGET_DIR}/${INC} \
 --username ${SYBASE_USERNAME} \
 --password ${SYBASE_PASSWORD} \
now I have imported newly inserted data into RDBMS to HDFS

but when I do

*select count(*) , unique_value from EMP group by unique_value (both in
RDBMS and in HIVE)*

I can find huge data loss.

1) in RDBMS

  Count(*)    Unique_value
  1000          201401201
   5000         201401202
  10000         201401203
2) in HIVE

  Count(*)    Unique_value
  189          201401201
   421         201401202
   50           201401203
If I do

select Unique value from emp ;

Result :
201401201
201401201
201401201
201401201
201401201
.
.
201401202
.
.
and so on...
*Pls help and suggest why is it so *
*Many thanks in advance*

*Yogesh kumar*