Finding this strange issue.
Hive writes an output to an external table, with LZO compression in place.
So, my hdfs folder has large_file.lzo
Using Sqoop, when I try to export this file to the mysql table, the num of
rows is doubled.
Then I do,
lzop -d large_file.lzo
This doesn't happen if I load the same file uncompressing it, "large_file"
Rows are as expected.
Where as both small_file and small_file.lzo are loaded with correct rows.
Sqoop : v 1.30
Num of mappers : 1
Observation : Any compressed file (gzipped or lzo) of size greater than 60
MB (might be 64 MB), while exported to DB puts the double the row count,
probably exact duplicates.
Can anyone please help?