|
|
-
Re: Sqoop export .lzo to mysql duplicatesJarek Jarcec Cecho 2012-11-23, 06:47
Hi Bhargav,
I believe that you might be hitting known Sqoop bug SQOOP-721 [1]. I was able to replicate the behaviour in my testing environment today and my intention is to continue debugging tomorrow. As a workaround you can decompress the files manually prior Sqoop export for now. Jarcec Links: 1: https://issues.apache.org/jira/browse/SQOOP-721 On Nov 22, 2012, at 10:00 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]> wrote: > Hi Bhargav, > I believe that you might be hitting known Sqoop bug SQOOP-721 [1]. I was able to replicate the behaviour in my testing environment today and my intention is to continue debugging tomorrow. > > As a workaround you can decompress the files manually prior Sqoop export for now. > > Jarcec > > Links: > 1: https://issues.apache.org/jira/browse/SQOOP-721 > > On Nov 22, 2012, at 9:07 PM, Bhargav Nallapu <[EMAIL PROTECTED]> wrote: > >> >> Hi, >> >> Finding this strange issue. >> >> Context: >> >> Hive writes an output to an external table, with LZO compression in place. So, my hdfs folder has large_file.lzo >> >> Using Sqoop, when I try to export this file to the mysql table, the num of rows is doubled. >> >> Then I do, >> lzop -d large_file.lzo >> >> This doesn't happen if I load the same file uncompressing it, "large_file" Rows are as expected. >> >> Where as both small_file and small_file.lzo are loaded with correct rows. >> >> Sqoop : v 1.30 >> Num of mappers : 1 >> >> Observation : Any compressed file (gzipped or lzo) of size greater than 60 MB (might be 64 MB), while exported to DB puts the double the row count, probably exact duplicates. >> Can anyone please help? >> > +
Bhargav Nallapu 2012-11-23, 09:04
+
Jarek Jarcec Cecho 2012-11-23, 15:18
|