Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop, mail # user - Exporting hive table data into oracle give date format error


Copy link to this message
-
Re: Exporting hive table data into oracle give date format error
Jarek Jarcec Cecho 2013-03-22, 02:54
Yup agreed, the documentation with regards to the escape strings and various edge conditions can be definitely improved. Would you mind creating a JIRA?

Jarcec

On Thu, Mar 21, 2013 at 11:30:22AM -0700, Venkat wrote:
> Hi Ajit/Jarcec
>
> I think the whole null string handling seems to need a special section in
> the documentation (witness the discussionso n Netezza null hnalding for
> direct loads that we had)
>
> We may have to come up with a specific recommendations on consistently
> handling NULL in all cases (String, non-string cases).   From the database
> perspectives, different DBs have different behavior on what they support
> and not support for null string (it differs by versions also).
>
> And the hive handling is also another potential issue that you have
> explained.
>
> Venkat
>
>
> On Thu, Mar 21, 2013 at 8:33 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:
>
> > Hi Ajit,
> > let me try to explain what I think is happening in your use case. There
> > are multiple moving pieces, so let me firstly summarize couple of behaviour
> > characteristics of the components:
> >
> > 1) Sqoop by default will use string "null" (lower case) to encode NULL
> > values from database. This can be changed via --(input-)null-(non-)string
> > arguments.
> >
> > 2) Hive by default uses \N for encoding NULL value.
> >
> > 3) When parsing input file, Hive will use NULL in case that it fails to
> > read some value rather than throwing exception and killing your query.
> >
> > Now let's specifically focus on your work flow. To make the explanation a
> > bit simpler, let's consider table "create table example(i int, t
> > varchar(50));" with one single row where each column is NULL.
> >
> > a) Sqooping in this table without custom --null-(non-)string argument will
> > lead to HDFS file with exactly line (one input row) where both columns will
> > be encoded as 'null' (default substitution string for NULL values). Result:
> >
> >   null,null
> >
> > b) Executing simple "select * from example" in Hive will lead to following
> > output row:
> >
> >   NULL null
> >
> > Let me explain what is happening here a bit more. Hive will read input
> > file and split it into columns. The first column is of type "integer" and
> > contains value "null". As string constant "null" is not a valid number for
> > integer column, this value is converted into NULL. Second column is of type
> > string,  constant "null" is fully valid string and thus this string is
> > returned - there is no conversion to NULL value!
> >
> > c) Exporting table "example" will work correctly as the file on HDFS still
> > contains expected "null,null".
> >
> > d) Now let's explore what will happen during creation of second table with
> > query "CREATE TABLE example2 AS SELECT * FROM example". As a part of the
> > query Hive will read all input rows and parse their values as was described
> > in b). Output will be serialized into output table example2. First column
> > was parsed as NULL, so it will be written out as \N (default NULL
> > substitution character for Hive). Second column was however parsed as a
> > valid string value and thus it will be serialized "as is". Resulting in
> > file with one single line "\N,null". Please notice that this select
> > statement has changed the on disk data!
> >
> > e) Exporting table "example2" can't obviously lead to consistent state as
> > the input file has been changed.
> >
> > Please do not hesitate to contact me if you still have any open questions!
> >
> > Jarcec
> >
> > On Thu, Mar 21, 2013 at 06:34:35AM +0000, Ajit Kumar Shreevastava wrote:
> > > Hi Jarek,
> > >
> > >
> > >
> > > Thanks a lot. Its working fine.
> > >
> > > In both the case in the chain mail.
> > >
> > >
> > >
> > > I also want to know the reason for SQOOP behavior for the data its
> > imported and exporting from oracle without the arguments --null-string
> > '\\N', --null-non-string '\\N' on import job.
> > >
> > >
> > >
> > > When I import the data without the arguments --null-string '\\N',