Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # user >> Fwd: Sqoop export .lzo to mysql duplicates

Copy link to this message
Fwd: Sqoop export .lzo to mysql duplicates

Finding this strange issue.


Hive writes an output to an external table, with LZO  compression in place.
So, my hdfs folder has large_file.lzo

Using Sqoop, when I try to export this file to the mysql table, the num of
rows is doubled.

Then I do,
lzop -d large_file.lzo

This doesn't happen if I load the same file uncompressing it, "large_file"
Rows are as expected.

Where as both small_file and small_file.lzo are loaded with correct rows.

Sqoop : v 1.30
Num of mappers : 1

Observation : Any compressed file (gzipped or lzo) of size greater than 60
MB (might be 64 MB), while exported to DB puts the double the row count,
probably exact duplicates.
Can anyone please help?