Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop, mail # user - Re: Sqoop export .lzo to mysql duplicates


+
Jarek Jarcec Cecho 2012-11-23, 06:47
+
Bhargav Nallapu 2012-11-23, 09:04
Copy link to this message
-
Re: Sqoop export .lzo to mysql duplicates
Jarek Jarcec Cecho 2012-11-23, 15:18
Hi Bhargav,
you're right that this ticket was filled yesterday, however I've noticed this behaviour in multiple users and I was able to replicate that in my testing environment. I'm going to continue investigating that today. However you're more then welcome to share do your own debugging. Contributions are always welcomed!

Jarcec

On Fri, Nov 23, 2012 at 02:34:37PM +0530, Bhargav Nallapu wrote:
> Hi Jarec,
>
> Thanks for a quick reply.
>
> Infact I've checked this ticket as soon as you directed me to.
>
> But was just skeptical since it was filed as recent as yesterday.
>
> Since exporting a gzipped file using sqoop is a pretty common thing to do ,
> I was wondering if it is a known issue already, or probably fixed in any of
> the recent versions. If not, I shall keep track of the ticket , try
> debugging myself or wait to know your findings.
>
>
> Thanks.
>
>
> On Fri, Nov 23, 2012 at 12:17 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:
>
> > Hi Bhargav,
> > I believe that you might be hitting known Sqoop bug SQOOP-721 [1]. I was
> > able to replicate the behaviour in my testing environment today and my
> > intention is to continue debugging tomorrow.
> >
> > As a workaround you can decompress the files manually prior Sqoop export
> > for now.
> >
> > Jarcec
> >
> > Links:
> > 1: https://issues.apache.org/jira/browse/SQOOP-721
> >
> > On Nov 22, 2012, at 10:00 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi Bhargav,
> > > I believe that you might be hitting known Sqoop bug SQOOP-721 [1]. I was
> > able to replicate the behaviour in my testing environment today and my
> > intention is to continue debugging tomorrow.
> > >
> > > As a workaround you can decompress the files manually prior Sqoop export
> > for now.
> > >
> > > Jarcec
> > >
> > > Links:
> > > 1: https://issues.apache.org/jira/browse/SQOOP-721
> > >
> > > On Nov 22, 2012, at 9:07 PM, Bhargav Nallapu <
> > [EMAIL PROTECTED]> wrote:
> > >
> > >>
> > >> Hi,
> > >>
> > >> Finding this strange issue.
> > >>
> > >> Context:
> > >>
> > >> Hive writes an output to an external table, with LZO  compression in
> > place. So, my hdfs folder has large_file.lzo
> > >>
> > >> Using Sqoop, when I try to export this file to the mysql table, the num
> > of rows is doubled.
> > >>
> > >> Then I do,
> > >> lzop -d large_file.lzo
> > >>
> > >> This doesn't happen if I load the same file uncompressing it,
> > "large_file" Rows are as expected.
> > >>
> > >> Where as both small_file and small_file.lzo are loaded with correct
> > rows.
> > >>
> > >> Sqoop : v 1.30
> > >> Num of mappers : 1
> > >>
> > >> Observation : Any compressed file (gzipped or lzo) of size greater than
> > 60 MB (might be 64 MB), while exported to DB puts the double the row count,
> > probably exact duplicates.
> > >> Can anyone please help?
> > >>
> > >
> >
> >