Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop, mail # user - Re: Sqoop export .lzo to mysql duplicates


Copy link to this message
-
Re: Sqoop export .lzo to mysql duplicates
Jarek Jarcec Cecho 2012-11-23, 06:47
Hi Bhargav,
I believe that you might be hitting known Sqoop bug SQOOP-721 [1]. I was able to replicate the behaviour in my testing environment today and my intention is to continue debugging tomorrow.

As a workaround you can decompress the files manually prior Sqoop export for now.

Jarcec

Links:
1: https://issues.apache.org/jira/browse/SQOOP-721

On Nov 22, 2012, at 10:00 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]> wrote:

> Hi Bhargav,
> I believe that you might be hitting known Sqoop bug SQOOP-721 [1]. I was able to replicate the behaviour in my testing environment today and my intention is to continue debugging tomorrow.
>
> As a workaround you can decompress the files manually prior Sqoop export for now.
>
> Jarcec
>
> Links:
> 1: https://issues.apache.org/jira/browse/SQOOP-721
>
> On Nov 22, 2012, at 9:07 PM, Bhargav Nallapu <[EMAIL PROTECTED]> wrote:
>
>>
>> Hi,
>>
>> Finding this strange issue.
>>
>> Context:
>>
>> Hive writes an output to an external table, with LZO  compression in place. So, my hdfs folder has large_file.lzo
>>
>> Using Sqoop, when I try to export this file to the mysql table, the num of rows is doubled.
>>
>> Then I do,
>> lzop -d large_file.lzo
>>
>> This doesn't happen if I load the same file uncompressing it, "large_file" Rows are as expected.
>>
>> Where as both small_file and small_file.lzo are loaded with correct rows.
>>
>> Sqoop : v 1.30
>> Num of mappers : 1
>>
>> Observation : Any compressed file (gzipped or lzo) of size greater than 60 MB (might be 64 MB), while exported to DB puts the double the row count, probably exact duplicates.
>> Can anyone please help?
>>
>
+
Bhargav Nallapu 2012-11-23, 09:04
+
Jarek Jarcec Cecho 2012-11-23, 15:18