Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop, mail # user - Exporting hive table data into oracle give date format error


+
Ajit Kumar Shreevastava 2013-03-13, 13:43
+
Jarek Jarcec Cecho 2013-03-16, 22:58
+
Ajit Kumar Shreevastava 2013-03-19, 08:13
+
Jarek Jarcec Cecho 2013-03-19, 21:25
+
Ajit Kumar Shreevastava 2013-03-20, 07:30
+
Jarek Jarcec Cecho 2013-03-21, 00:48
+
Ajit Kumar Shreevastava 2013-03-21, 06:34
+
Jarek Jarcec Cecho 2013-03-21, 15:33
+
Ajit Kumar Shreevastava 2013-03-22, 07:37
Copy link to this message
-
Re: Exporting hive table data into oracle give date format error
Venkat 2013-03-21, 18:30
Hi Ajit/Jarcec

I think the whole null string handling seems to need a special section in
the documentation (witness the discussionso n Netezza null hnalding for
direct loads that we had)

We may have to come up with a specific recommendations on consistently
handling NULL in all cases (String, non-string cases).   From the database
perspectives, different DBs have different behavior on what they support
and not support for null string (it differs by versions also).

And the hive handling is also another potential issue that you have
explained.

Venkat
On Thu, Mar 21, 2013 at 8:33 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:

> Hi Ajit,
> let me try to explain what I think is happening in your use case. There
> are multiple moving pieces, so let me firstly summarize couple of behaviour
> characteristics of the components:
>
> 1) Sqoop by default will use string "null" (lower case) to encode NULL
> values from database. This can be changed via --(input-)null-(non-)string
> arguments.
>
> 2) Hive by default uses \N for encoding NULL value.
>
> 3) When parsing input file, Hive will use NULL in case that it fails to
> read some value rather than throwing exception and killing your query.
>
> Now let's specifically focus on your work flow. To make the explanation a
> bit simpler, let's consider table "create table example(i int, t
> varchar(50));" with one single row where each column is NULL.
>
> a) Sqooping in this table without custom --null-(non-)string argument will
> lead to HDFS file with exactly line (one input row) where both columns will
> be encoded as 'null' (default substitution string for NULL values). Result:
>
>   null,null
>
> b) Executing simple "select * from example" in Hive will lead to following
> output row:
>
>   NULL null
>
> Let me explain what is happening here a bit more. Hive will read input
> file and split it into columns. The first column is of type "integer" and
> contains value "null". As string constant "null" is not a valid number for
> integer column, this value is converted into NULL. Second column is of type
> string,  constant "null" is fully valid string and thus this string is
> returned - there is no conversion to NULL value!
>
> c) Exporting table "example" will work correctly as the file on HDFS still
> contains expected "null,null".
>
> d) Now let's explore what will happen during creation of second table with
> query "CREATE TABLE example2 AS SELECT * FROM example". As a part of the
> query Hive will read all input rows and parse their values as was described
> in b). Output will be serialized into output table example2. First column
> was parsed as NULL, so it will be written out as \N (default NULL
> substitution character for Hive). Second column was however parsed as a
> valid string value and thus it will be serialized "as is". Resulting in
> file with one single line "\N,null". Please notice that this select
> statement has changed the on disk data!
>
> e) Exporting table "example2" can't obviously lead to consistent state as
> the input file has been changed.
>
> Please do not hesitate to contact me if you still have any open questions!
>
> Jarcec
>
> On Thu, Mar 21, 2013 at 06:34:35AM +0000, Ajit Kumar Shreevastava wrote:
> > Hi Jarek,
> >
> >
> >
> > Thanks a lot. Its working fine.
> >
> > In both the case in the chain mail.
> >
> >
> >
> > I also want to know the reason for SQOOP behavior for the data its
> imported and exporting from oracle without the arguments --null-string
> '\\N', --null-non-string '\\N' on import job.
> >
> >
> >
> > When I import the data without the arguments --null-string '\\N',
> --null-non-string '\\N' on import job to a Hive table. Then I am able to
> export successfully back to the Oracle table without any error and data
> mismatch. But copied the same table structure and data to another hive
> table I am not able to do so. Is there any reason for that? Is SQOOP store
> its data definition or data-formatting for the same?  If you explained the

Regards

Venkat
+
Jarek Jarcec Cecho 2013-03-22, 02:54
+
Venkat Ranganathan 2013-03-22, 03:24
+
Jarek Jarcec Cecho 2013-03-22, 03:46