Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Exporting hive table data into oracle give date format error


Copy link to this message
-
Re: Exporting hive table data into oracle give date format error
I created SQOOP-963 earlier for this

Venkat

On Thu, Mar 21, 2013 at 7:54 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]> wrote:
> Yup agreed, the documentation with regards to the escape strings and various edge conditions can be definitely improved. Would you mind creating a JIRA?
>
> Jarcec
>
> On Thu, Mar 21, 2013 at 11:30:22AM -0700, Venkat wrote:
>> Hi Ajit/Jarcec
>>
>> I think the whole null string handling seems to need a special section in
>> the documentation (witness the discussionso n Netezza null hnalding for
>> direct loads that we had)
>>
>> We may have to come up with a specific recommendations on consistently
>> handling NULL in all cases (String, non-string cases).   From the database
>> perspectives, different DBs have different behavior on what they support
>> and not support for null string (it differs by versions also).
>>
>> And the hive handling is also another potential issue that you have
>> explained.
>>
>> Venkat
>>
>>
>> On Thu, Mar 21, 2013 at 8:33 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:
>>
>> > Hi Ajit,
>> > let me try to explain what I think is happening in your use case. There
>> > are multiple moving pieces, so let me firstly summarize couple of behaviour
>> > characteristics of the components:
>> >
>> > 1) Sqoop by default will use string "null" (lower case) to encode NULL
>> > values from database. This can be changed via --(input-)null-(non-)string
>> > arguments.
>> >
>> > 2) Hive by default uses \N for encoding NULL value.
>> >
>> > 3) When parsing input file, Hive will use NULL in case that it fails to
>> > read some value rather than throwing exception and killing your query.
>> >
>> > Now let's specifically focus on your work flow. To make the explanation a
>> > bit simpler, let's consider table "create table example(i int, t
>> > varchar(50));" with one single row where each column is NULL.
>> >
>> > a) Sqooping in this table without custom --null-(non-)string argument will
>> > lead to HDFS file with exactly line (one input row) where both columns will
>> > be encoded as 'null' (default substitution string for NULL values). Result:
>> >
>> >   null,null
>> >
>> > b) Executing simple "select * from example" in Hive will lead to following
>> > output row:
>> >
>> >   NULL null
>> >
>> > Let me explain what is happening here a bit more. Hive will read input
>> > file and split it into columns. The first column is of type "integer" and
>> > contains value "null". As string constant "null" is not a valid number for
>> > integer column, this value is converted into NULL. Second column is of type
>> > string,  constant "null" is fully valid string and thus this string is
>> > returned - there is no conversion to NULL value!
>> >
>> > c) Exporting table "example" will work correctly as the file on HDFS still
>> > contains expected "null,null".
>> >
>> > d) Now let's explore what will happen during creation of second table with
>> > query "CREATE TABLE example2 AS SELECT * FROM example". As a part of the
>> > query Hive will read all input rows and parse their values as was described
>> > in b). Output will be serialized into output table example2. First column
>> > was parsed as NULL, so it will be written out as \N (default NULL
>> > substitution character for Hive). Second column was however parsed as a
>> > valid string value and thus it will be serialized "as is". Resulting in
>> > file with one single line "\N,null". Please notice that this select
>> > statement has changed the on disk data!
>> >
>> > e) Exporting table "example2" can't obviously lead to consistent state as
>> > the input file has been changed.
>> >
>> > Please do not hesitate to contact me if you still have any open questions!
>> >
>> > Jarcec
>> >
>> > On Thu, Mar 21, 2013 at 06:34:35AM +0000, Ajit Kumar Shreevastava wrote:
>> > > Hi Jarek,
>> > >
>> > >
>> > >
>> > > Thanks a lot. Its working fine.
>> > >
>> > > In both the case in the chain mail.
>> > >
>> > >
>> > >
>> > > I also want to know the reason for SQOOP behavior for the data its
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB