Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop, mail # user - Exporting hive table data into oracle give date format error


+
Ajit Kumar Shreevastava 2013-03-13, 13:43
+
Jarek Jarcec Cecho 2013-03-16, 22:58
+
Ajit Kumar Shreevastava 2013-03-19, 08:13
+
Jarek Jarcec Cecho 2013-03-19, 21:25
+
Ajit Kumar Shreevastava 2013-03-20, 07:30
+
Jarek Jarcec Cecho 2013-03-21, 00:48
+
Ajit Kumar Shreevastava 2013-03-21, 06:34
+
Jarek Jarcec Cecho 2013-03-21, 15:33
+
Ajit Kumar Shreevastava 2013-03-22, 07:37
+
Venkat 2013-03-21, 18:30
+
Jarek Jarcec Cecho 2013-03-22, 02:54
Copy link to this message
-
Re: Exporting hive table data into oracle give date format error
Venkat Ranganathan 2013-03-22, 03:24
I created SQOOP-963 earlier for this

Venkat

On Thu, Mar 21, 2013 at 7:54 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]> wrote:
> Yup agreed, the documentation with regards to the escape strings and various edge conditions can be definitely improved. Would you mind creating a JIRA?
>
> Jarcec
>
> On Thu, Mar 21, 2013 at 11:30:22AM -0700, Venkat wrote:
>> Hi Ajit/Jarcec
>>
>> I think the whole null string handling seems to need a special section in
>> the documentation (witness the discussionso n Netezza null hnalding for
>> direct loads that we had)
>>
>> We may have to come up with a specific recommendations on consistently
>> handling NULL in all cases (String, non-string cases).   From the database
>> perspectives, different DBs have different behavior on what they support
>> and not support for null string (it differs by versions also).
>>
>> And the hive handling is also another potential issue that you have
>> explained.
>>
>> Venkat
>>
>>
>> On Thu, Mar 21, 2013 at 8:33 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:
>>
>> > Hi Ajit,
>> > let me try to explain what I think is happening in your use case. There
>> > are multiple moving pieces, so let me firstly summarize couple of behaviour
>> > characteristics of the components:
>> >
>> > 1) Sqoop by default will use string "null" (lower case) to encode NULL
>> > values from database. This can be changed via --(input-)null-(non-)string
>> > arguments.
>> >
>> > 2) Hive by default uses \N for encoding NULL value.
>> >
>> > 3) When parsing input file, Hive will use NULL in case that it fails to
>> > read some value rather than throwing exception and killing your query.
>> >
>> > Now let's specifically focus on your work flow. To make the explanation a
>> > bit simpler, let's consider table "create table example(i int, t
>> > varchar(50));" with one single row where each column is NULL.
>> >
>> > a) Sqooping in this table without custom --null-(non-)string argument will
>> > lead to HDFS file with exactly line (one input row) where both columns will
>> > be encoded as 'null' (default substitution string for NULL values). Result:
>> >
>> >   null,null
>> >
>> > b) Executing simple "select * from example" in Hive will lead to following
>> > output row:
>> >
>> >   NULL null
>> >
>> > Let me explain what is happening here a bit more. Hive will read input
>> > file and split it into columns. The first column is of type "integer" and
>> > contains value "null". As string constant "null" is not a valid number for
>> > integer column, this value is converted into NULL. Second column is of type
>> > string,  constant "null" is fully valid string and thus this string is
>> > returned - there is no conversion to NULL value!
>> >
>> > c) Exporting table "example" will work correctly as the file on HDFS still
>> > contains expected "null,null".
>> >
>> > d) Now let's explore what will happen during creation of second table with
>> > query "CREATE TABLE example2 AS SELECT * FROM example". As a part of the
>> > query Hive will read all input rows and parse their values as was described
>> > in b). Output will be serialized into output table example2. First column
>> > was parsed as NULL, so it will be written out as \N (default NULL
>> > substitution character for Hive). Second column was however parsed as a
>> > valid string value and thus it will be serialized "as is". Resulting in
>> > file with one single line "\N,null". Please notice that this select
>> > statement has changed the on disk data!
>> >
>> > e) Exporting table "example2" can't obviously lead to consistent state as
>> > the input file has been changed.
>> >
>> > Please do not hesitate to contact me if you still have any open questions!
>> >
>> > Jarcec
>> >
>> > On Thu, Mar 21, 2013 at 06:34:35AM +0000, Ajit Kumar Shreevastava wrote:
>> > > Hi Jarek,
>> > >
>> > >
>> > >
>> > > Thanks a lot. Its working fine.
>> > >
>> > > In both the case in the chain mail.
>> > >
>> > >
>> > >
>> > > I also want to know the reason for SQOOP behavior for the data its
+
Jarek Jarcec Cecho 2013-03-22, 03:46