Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Data mismatch when importing data from Oracle to Hive through Sqoop without an error


Copy link to this message
-
Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an error
Sqoop maps numeric and decimal types (RDBMS) to double (Hive). I think the
values that don't fit in double must be getting inserted as NULL.
You can see this warning in your logs.

Thanks,
Abhijeet

On Wed, Mar 6, 2013 at 1:32 PM, Ajit Kumar Shreevastava <
[EMAIL PROTECTED]> wrote:

>  Hi all,****
>
> I have notice one interesting thing in the below result-set.****
>
> I have fired one query in both Oracle and Hive shell and found the
> following result set:à****
>
> ** **
>
> *SQL> select count(1) from bttn*
>
> *  2  where bttn_id is null or data_inst_id is null or scr_id is null;*
>
> * *
>
> *  COUNT(1)*
>
> *----------*
>
> *         0*
>
> *hive> select count(1) from bttn*
>
> *    > where bttn_id is null or data_inst_id is null or scr_id is null;*
>
> Total MapReduce jobs = 1****
>
> Launching Job 1 out of 1****
>
> Number of reduce tasks determined at compile time: 1****
>
> In order to change the average load for a reducer (in bytes):****
>
>   set hive.exec.reducers.bytes.per.reducer=<number>****
>
> In order to limit the maximum number of reducers:****
>
>   set hive.exec.reducers.max=<number>****
>
> In order to set a constant number of reducers:****
>
>   set mapred.reduce.tasks=<number>****
>
> Starting Job = job_201303051835_0020, Tracking URL > http://NHCLT-PC44-2:50030/jobdetails.jsp?jobid=job_201303051835_0020****
>
> Kill Command = /home/hadoop/hadoop-1.0.3/bin/hadoop job  -kill
> job_201303051835_0020****
>
> Hadoop job information for Stage-1: number of mappers: 1; number of
> reducers: 1****
>
> 2013-03-06 13:22:56,908 Stage-1 map = 0%,  reduce = 0%****
>
> 2013-03-06 13:23:05,928 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:06,931 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:07,934 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:08,938 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:09,941 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:10,944 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:11,947 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:12,956 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:13,959 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:14,962 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:15,965 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:16,969 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:17,974 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 6.95 sec****
>
> 2013-03-06 13:23:18,977 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 6.95 sec****
>
> 2013-03-06 13:23:19,981 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 6.95 sec****
>
> 2013-03-06 13:23:20,985 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 6.95 sec****
>
> 2013-03-06 13:23:21,988 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 6.95 sec****
>
> 2013-03-06 13:23:22,995 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 6.95 sec****
>
> 2013-03-06 13:23:23,998 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 6.95 sec****
>
> MapReduce Total cumulative CPU time: 6 seconds 950 msec****
>
> Ended Job = job_201303051835_0020****
>
> MapReduce Jobs Launched:****
>
> Job 0: Map: 1  Reduce: 1   Cumulative CPU: 6.95 sec   HDFS Read: 184270926
> HDFS Write: 4 SUCCESS****
>
> Total MapReduce CPU Time Spent: 6 seconds 950 msec****
>
> OK****
>
> *986*
>
> Time taken: 35.983 seconds****
>
> *hive>*
>
> ** **
>
> and  *739169 – **738183=986*
>
> * *
>
> *can anyone tell me why this happened as BTTN_ID  ,  DATA_INST_ID,
> SCR_ID   are not null constrains of BTTN table and also composite Primary
> Key of the table?*
>
> *Also tell me how can I prevent this unnecessary data generation in HIVE
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB