Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop, mail # user - Exporting hive table data into oracle give date format error


+
Ajit Kumar Shreevastava 2013-03-13, 13:43
+
Jarek Jarcec Cecho 2013-03-16, 22:58
+
Ajit Kumar Shreevastava 2013-03-19, 08:13
+
Jarek Jarcec Cecho 2013-03-19, 21:25
+
Ajit Kumar Shreevastava 2013-03-20, 07:30
+
Jarek Jarcec Cecho 2013-03-21, 00:48
+
Ajit Kumar Shreevastava 2013-03-21, 06:34
+
Jarek Jarcec Cecho 2013-03-21, 15:33
+
Ajit Kumar Shreevastava 2013-03-22, 07:37
+
Venkat 2013-03-21, 18:30
+
Jarek Jarcec Cecho 2013-03-22, 02:54
+
Venkat Ranganathan 2013-03-22, 03:24
Copy link to this message
-
Re: Exporting hive table data into oracle give date format error
Jarek Jarcec Cecho 2013-03-22, 03:46
Thank you!

Jarcec

On Thu, Mar 21, 2013 at 08:24:48PM -0700, Venkat Ranganathan wrote:
> I created SQOOP-963 earlier for this
>
> Venkat
>
> On Thu, Mar 21, 2013 at 7:54 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]> wrote:
> > Yup agreed, the documentation with regards to the escape strings and various edge conditions can be definitely improved. Would you mind creating a JIRA?
> >
> > Jarcec
> >
> > On Thu, Mar 21, 2013 at 11:30:22AM -0700, Venkat wrote:
> >> Hi Ajit/Jarcec
> >>
> >> I think the whole null string handling seems to need a special section in
> >> the documentation (witness the discussionso n Netezza null hnalding for
> >> direct loads that we had)
> >>
> >> We may have to come up with a specific recommendations on consistently
> >> handling NULL in all cases (String, non-string cases).   From the database
> >> perspectives, different DBs have different behavior on what they support
> >> and not support for null string (it differs by versions also).
> >>
> >> And the hive handling is also another potential issue that you have
> >> explained.
> >>
> >> Venkat
> >>
> >>
> >> On Thu, Mar 21, 2013 at 8:33 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:
> >>
> >> > Hi Ajit,
> >> > let me try to explain what I think is happening in your use case. There
> >> > are multiple moving pieces, so let me firstly summarize couple of behaviour
> >> > characteristics of the components:
> >> >
> >> > 1) Sqoop by default will use string "null" (lower case) to encode NULL
> >> > values from database. This can be changed via --(input-)null-(non-)string
> >> > arguments.
> >> >
> >> > 2) Hive by default uses \N for encoding NULL value.
> >> >
> >> > 3) When parsing input file, Hive will use NULL in case that it fails to
> >> > read some value rather than throwing exception and killing your query.
> >> >
> >> > Now let's specifically focus on your work flow. To make the explanation a
> >> > bit simpler, let's consider table "create table example(i int, t
> >> > varchar(50));" with one single row where each column is NULL.
> >> >
> >> > a) Sqooping in this table without custom --null-(non-)string argument will
> >> > lead to HDFS file with exactly line (one input row) where both columns will
> >> > be encoded as 'null' (default substitution string for NULL values). Result:
> >> >
> >> >   null,null
> >> >
> >> > b) Executing simple "select * from example" in Hive will lead to following
> >> > output row:
> >> >
> >> >   NULL null
> >> >
> >> > Let me explain what is happening here a bit more. Hive will read input
> >> > file and split it into columns. The first column is of type "integer" and
> >> > contains value "null". As string constant "null" is not a valid number for
> >> > integer column, this value is converted into NULL. Second column is of type
> >> > string,  constant "null" is fully valid string and thus this string is
> >> > returned - there is no conversion to NULL value!
> >> >
> >> > c) Exporting table "example" will work correctly as the file on HDFS still
> >> > contains expected "null,null".
> >> >
> >> > d) Now let's explore what will happen during creation of second table with
> >> > query "CREATE TABLE example2 AS SELECT * FROM example". As a part of the
> >> > query Hive will read all input rows and parse their values as was described
> >> > in b). Output will be serialized into output table example2. First column
> >> > was parsed as NULL, so it will be written out as \N (default NULL
> >> > substitution character for Hive). Second column was however parsed as a
> >> > valid string value and thus it will be serialized "as is". Resulting in
> >> > file with one single line "\N,null". Please notice that this select
> >> > statement has changed the on disk data!
> >> >
> >> > e) Exporting table "example2" can't obviously lead to consistent state as
> >> > the input file has been changed.
> >> >
> >> > Please do not hesitate to contact me if you still have any open questions!
> >> >
> >> > Jarcec
> >> >
> >> > On Thu, Mar 21, 2013 at 06:34:35AM +0000, Ajit Kumar Shreevastava wrote: