Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Getting bogus rows from sqoop import...?


Copy link to this message
-
Re: Getting bogus rows from sqoop import...?
Thanks for your response Jarek :)

I've started a new import run with --hive-drop-import-delims added and
--direct removed (since the two are mutually exclusive), we'll see how it
goes.

Going to sleep now. I'll report back tomorrow :)

--
Felix
On Thu, Mar 21, 2013 at 12:42 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:

> Hi Felix,
> we've seen similar behaviour in the past when the data itself contains
> Hive special characters like new line characters. Would you mind trying
> your import with --hive-drop-import-delims to see if it helps?
>
> Jarcec
>
> On Wed, Mar 20, 2013 at 11:27:58PM -0400, Felix GV wrote:
> > Hello,
> >
> > I'm trying to import a full table from MySQL to Hadoop/Hive. It works
> with
> > certain parameters, but when I try to do an ETL that's somewhat more
> > complex, I start getting bogus rows in my resulting table.
> >
> > This works:
> >
> > sqoop import \
> >         --connect
> >
> 'jdbc:mysql://backup.general.db/general?tinyInt1isBit=false&zeroDateTimeBehavior=convertToNull'
> > \
> >         --username xxxxx \
> >         --password xxxxx \
> >         --hive-import \
> >         --hive-overwrite \
> >         -m 23 \
> >         --direct \
> >         --hive-table profile_felix_test17 \
> >         --split-by id \
> >         --table Profile
> >
> > But if I use a --query instead of a --table, then I start getting bogus
> > records (and by that, I mean rows that have a non-sensically high primary
> > key that doesn't exist in my source database and null for the rest of the
> > cells).
> >
> > The output I get with the above query is not exactly the way I want it.
> > Using --query, I can get the data in the format I want (by transforming
> > some stuff inside MySQL), but then I also get the bogus rows, which
> pretty
> > much makes the Hive table unusable.
> >
> > I tried various combinations of parameters and it's hard to pin-point
> > exactly what causes the problem, so it could be more intricate than my
> > above simplistic description. That being said, removing --table and
> adding
> > the following params definitely breaks it:
> >
> >         --target-dir /tests/sqoop/general/profile_felix_test \
> >         --query "select * from Profile WHERE \$CONDITIONS"
> >
> > (Ultimately, I want to use a query that's more complex than this, but
> even
> > a simple query like this breaks...)
> >
> > Any ideas why this would happen and how to solve it?
> >
> > Is this the kind of problem that Sqoop2's cleaner architecture intends to
> > solve?
> >
> > I use CDH 4.2, BTW.
> >
> > Thanks :) !
> >
> > --
> > Felix
>