Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Exporting hive table data into oracle give date format error


Copy link to this message
-
RE: Exporting hive table data into oracle give date format error
Thanks Jarek for your explanations .

With Regards,
Ajit

-----Original Message-----
From: Jarek Jarcec Cecho [mailto:[EMAIL PROTECTED]]
Sent: Thursday, March 21, 2013 9:03 PM
To: [EMAIL PROTECTED]
Subject: Re: Exporting hive table data into oracle give date format error

Hi Ajit,
let me try to explain what I think is happening in your use case. There are multiple moving pieces, so let me firstly summarize couple of behaviour characteristics of the components:

1) Sqoop by default will use string "null" (lower case) to encode NULL values from database. This can be changed via --(input-)null-(non-)string arguments.

2) Hive by default uses \N for encoding NULL value.

3) When parsing input file, Hive will use NULL in case that it fails to read some value rather than throwing exception and killing your query.

Now let's specifically focus on your work flow. To make the explanation a bit simpler, let's consider table "create table example(i int, t varchar(50));" with one single row where each column is NULL.

a) Sqooping in this table without custom --null-(non-)string argument will lead to HDFS file with exactly line (one input row) where both columns will be encoded as 'null' (default substitution string for NULL values). Result:

  null,null

b) Executing simple "select * from example" in Hive will lead to following output row:

  NULL null

Let me explain what is happening here a bit more. Hive will read input file and split it into columns. The first column is of type "integer" and contains value "null". As string constant "null" is not a valid number for integer column, this value is converted into NULL. Second column is of type string,  constant "null" is fully valid string and thus this string is returned - there is no conversion to NULL value!

c) Exporting table "example" will work correctly as the file on HDFS still contains expected "null,null".

d) Now let's explore what will happen during creation of second table with query "CREATE TABLE example2 AS SELECT * FROM example". As a part of the query Hive will read all input rows and parse their values as was described in b). Output will be serialized into output table example2. First column was parsed as NULL, so it will be written out as \N (default NULL substitution character for Hive). Second column was however parsed as a valid string value and thus it will be serialized "as is". Resulting in file with one single line "\N,null". Please notice that this select statement has changed the on disk data!

e) Exporting table "example2" can't obviously lead to consistent state as the input file has been changed.

Please do not hesitate to contact me if you still have any open questions!

Jarcec

On Thu, Mar 21, 2013 at 06:34:35AM +0000, Ajit Kumar Shreevastava wrote:
> Hi Jarek,
>
>
>
> Thanks a lot. Its working fine.
>
> In both the case in the chain mail.
>
>
>
> I also want to know the reason for SQOOP behavior for the data its imported and exporting from oracle without the arguments --null-string '\\N', --null-non-string '\\N' on import job.
>
>
>
> When I import the data without the arguments --null-string '\\N', --null-non-string '\\N' on import job to a Hive table. Then I am able to export successfully back to the Oracle table without any error and data mismatch. But copied the same table structure and data to another hive table I am not able to do so. Is there any reason for that? Is SQOOP store its data definition or data-formatting for the same?  If you explained the internal behavior of SQOOP to clear my concept for importing and exporting the table from a relation database to Hive and vice versa . I have explained the scenario in my chained mail for your reference and also highlighted the facts below.
>
>
>
> I am looking for your valuable comments on the below highlighted scenario.
>
>
>
> Thanking You,
>
> Regards'
>
> Ajit
>
>
>
>
>
> -----Original Message-----
> From: Jarek Jarcec Cecho [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, March 21, 2013 6:19 AM
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB