Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # user >> RE: Data mismatch when importing data from Oracle to Hive through Sqoop without an error


+
Ajit Kumar Shreevastava 2013-03-06, 08:02
+
abhijeet gaikwad 2013-03-06, 08:15
+
Ajit Kumar Shreevastava 2013-03-06, 08:35
Copy link to this message
-
Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an error
If you see Hadoop job counters in your logs - numbers are correct till that
point. So normal import in HDFS is working fine. Only reasonable
explanation is that there is an issue inserting data in Hive.

Have a look at the file on HDFS it should contain the correct number of
records with correct data. But if you do "select * ..." from Hive console
it will give you NULL values for invalid data, as in the data that cannot
be converted to the respective data type of the column of Hive table. This
is because Hive imposes data type constraints on the data in files on HDFS.
Those extra records is a mystry for me too. I don't think actual file on
HDFS has extra records.

Sqoop is creating Hive table incorrectly which **may** have caused this
issue. Create table manually with correct data types if equivalent
available or as string and try loading data. If this works fine we know
where the issue is!

Thanks,
Abhijeet
On Wed, Mar 6, 2013 at 2:05 PM, Ajit Kumar Shreevastava <
[EMAIL PROTECTED]> wrote:
>  Hi Abhijeet,****
>
> ** **
>
> Thanks for your response.****
>
> If values that don’t fit in double must be getting inserted as Null is the
> case then count should not be mis-match in both the case.****
>
> Here the null value inserted are extra value apart from the other value
> which is already present in both Oracle Table and Hive table.****
>
> ** **
>
> Correct me if I am wrong in interpretation.****
>
> ** **
>
> Thanks and Regards,****
>
> Ajit Kumar Shreevastava****
>
> ** **
>
> *From:* abhijeet gaikwad [mailto:[EMAIL PROTECTED]]
> *Sent:* Wednesday, March 06, 2013 1:46 PM
> *To:* [EMAIL PROTECTED]
> *Cc:* [EMAIL PROTECTED]
> *Subject:* Re: Data mismatch when importing data from Oracle to Hive
> through Sqoop without an error****
>
>   ** **
>
> Sqoop maps numeric and decimal types (RDBMS) to double (Hive). I think the
> values that don't fit in double must be getting inserted as NULL.
> You can see this warning in your logs.
>
> Thanks,
> Abhijeet****
>
> On Wed, Mar 6, 2013 at 1:32 PM, Ajit Kumar Shreevastava <
> [EMAIL PROTECTED]> wrote:****
>
> Hi all,****
>
> I have notice one interesting thing in the below result-set.****
>
> I have fired one query in both Oracle and Hive shell and found the
> following result set:à****
>
>  ****
>
> *SQL> select count(1) from bttn*****
>
> *  2  where bttn_id is null or data_inst_id is null or scr_id is null;****
> *
>
> * *****
>
> *  COUNT(1)*****
>
> *----------*****
>
> *         0*****
>
> *hive> select count(1) from bttn*****
>
> *    > where bttn_id is null or data_inst_id is null or scr_id is null;***
> **
>
> Total MapReduce jobs = 1****
>
> Launching Job 1 out of 1****
>
> Number of reduce tasks determined at compile time: 1****
>
> In order to change the average load for a reducer (in bytes):****
>
>   set hive.exec.reducers.bytes.per.reducer=<number>****
>
> In order to limit the maximum number of reducers:****
>
>   set hive.exec.reducers.max=<number>****
>
> In order to set a constant number of reducers:****
>
>   set mapred.reduce.tasks=<number>****
>
> Starting Job = job_201303051835_0020, Tracking URL > http://NHCLT-PC44-2:50030/jobdetails.jsp?jobid=job_201303051835_0020<http://nhclt-pc44-2:50030/jobdetails.jsp?jobid=job_201303051835_0020>
> ****
>
> Kill Command = /home/hadoop/hadoop-1.0.3/bin/hadoop job  -kill
> job_201303051835_0020****
>
> Hadoop job information for Stage-1: number of mappers: 1; number of
> reducers: 1****
>
> 2013-03-06 13:22:56,908 Stage-1 map = 0%,  reduce = 0%****
>
> 2013-03-06 13:23:05,928 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:06,931 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:07,934 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:08,938 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:09,941 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:10,944 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
+
Jarek Jarcec Cecho 2013-03-06, 17:35
+
Chalcy 2013-03-08, 17:45
+
Ajit Kumar Shreevastava 2013-03-13, 08:44
+
Chalcy 2013-03-17, 12:55
+
Jarek Jarcec Cecho 2013-03-16, 22:56
+
Venkat Ranganathan 2013-03-06, 16:06
+
Ajit Kumar Shreevastava 2013-03-08, 13:38
+
Ajit Kumar Shreevastava 2013-03-06, 07:09
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB