Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop, mail # user - Re: Sqoop export .lzo to mysql duplicates


+
Jarek Jarcec Cecho 2012-11-23, 06:47
Copy link to this message
-
Re: Sqoop export .lzo to mysql duplicates
Bhargav Nallapu 2012-11-23, 09:04
Hi Jarec,

Thanks for a quick reply.

Infact I've checked this ticket as soon as you directed me to.

But was just skeptical since it was filed as recent as yesterday.

Since exporting a gzipped file using sqoop is a pretty common thing to do ,
I was wondering if it is a known issue already, or probably fixed in any of
the recent versions. If not, I shall keep track of the ticket , try
debugging myself or wait to know your findings.
Thanks.
On Fri, Nov 23, 2012 at 12:17 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:

> Hi Bhargav,
> I believe that you might be hitting known Sqoop bug SQOOP-721 [1]. I was
> able to replicate the behaviour in my testing environment today and my
> intention is to continue debugging tomorrow.
>
> As a workaround you can decompress the files manually prior Sqoop export
> for now.
>
> Jarcec
>
> Links:
> 1: https://issues.apache.org/jira/browse/SQOOP-721
>
> On Nov 22, 2012, at 10:00 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>
> wrote:
>
> > Hi Bhargav,
> > I believe that you might be hitting known Sqoop bug SQOOP-721 [1]. I was
> able to replicate the behaviour in my testing environment today and my
> intention is to continue debugging tomorrow.
> >
> > As a workaround you can decompress the files manually prior Sqoop export
> for now.
> >
> > Jarcec
> >
> > Links:
> > 1: https://issues.apache.org/jira/browse/SQOOP-721
> >
> > On Nov 22, 2012, at 9:07 PM, Bhargav Nallapu <
> [EMAIL PROTECTED]> wrote:
> >
> >>
> >> Hi,
> >>
> >> Finding this strange issue.
> >>
> >> Context:
> >>
> >> Hive writes an output to an external table, with LZO  compression in
> place. So, my hdfs folder has large_file.lzo
> >>
> >> Using Sqoop, when I try to export this file to the mysql table, the num
> of rows is doubled.
> >>
> >> Then I do,
> >> lzop -d large_file.lzo
> >>
> >> This doesn't happen if I load the same file uncompressing it,
> "large_file" Rows are as expected.
> >>
> >> Where as both small_file and small_file.lzo are loaded with correct
> rows.
> >>
> >> Sqoop : v 1.30
> >> Num of mappers : 1
> >>
> >> Observation : Any compressed file (gzipped or lzo) of size greater than
> 60 MB (might be 64 MB), while exported to DB puts the double the row count,
> probably exact duplicates.
> >> Can anyone please help?
> >>
> >
>
>
+
Jarek Jarcec Cecho 2012-11-23, 15:18