Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Data mismatch when importing data from Oracle to Hive through Sqoop without an error


Copy link to this message
-
Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an error
Sqoop maps numeric and decimal types (RDBMS) to double (Hive). I think the
values that don't fit in double must be getting inserted as NULL.
You can see this warning in your logs.

Thanks,
Abhijeet

On Wed, Mar 6, 2013 at 1:32 PM, Ajit Kumar Shreevastava <
[EMAIL PROTECTED]> wrote:

>  Hi all,****
>
> I have notice one interesting thing in the below result-set.****
>
> I have fired one query in both Oracle and Hive shell and found the
> following result set:à****
>
> ** **
>
> *SQL> select count(1) from bttn*
>
> *  2  where bttn_id is null or data_inst_id is null or scr_id is null;*
>
> * *
>
> *  COUNT(1)*
>
> *----------*
>
> *         0*
>
> *hive> select count(1) from bttn*
>
> *    > where bttn_id is null or data_inst_id is null or scr_id is null;*
>
> Total MapReduce jobs = 1****
>
> Launching Job 1 out of 1****
>
> Number of reduce tasks determined at compile time: 1****
>
> In order to change the average load for a reducer (in bytes):****
>
>   set hive.exec.reducers.bytes.per.reducer=<number>****
>
> In order to limit the maximum number of reducers:****
>
>   set hive.exec.reducers.max=<number>****
>
> In order to set a constant number of reducers:****
>
>   set mapred.reduce.tasks=<number>****
>
> Starting Job = job_201303051835_0020, Tracking URL > http://NHCLT-PC44-2:50030/jobdetails.jsp?jobid=job_201303051835_0020****
>
> Kill Command = /home/hadoop/hadoop-1.0.3/bin/hadoop job  -kill
> job_201303051835_0020****
>
> Hadoop job information for Stage-1: number of mappers: 1; number of
> reducers: 1****
>
> 2013-03-06 13:22:56,908 Stage-1 map = 0%,  reduce = 0%****
>
> 2013-03-06 13:23:05,928 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:06,931 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:07,934 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:08,938 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:09,941 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:10,944 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:11,947 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:12,956 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:13,959 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:14,962 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:15,965 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:16,969 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU
> 5.2 sec****
>
> 2013-03-06 13:23:17,974 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 6.95 sec****
>
> 2013-03-06 13:23:18,977 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 6.95 sec****
>
> 2013-03-06 13:23:19,981 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 6.95 sec****
>
> 2013-03-06 13:23:20,985 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 6.95 sec****
>
> 2013-03-06 13:23:21,988 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 6.95 sec****
>
> 2013-03-06 13:23:22,995 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 6.95 sec****
>
> 2013-03-06 13:23:23,998 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 6.95 sec****
>
> MapReduce Total cumulative CPU time: 6 seconds 950 msec****
>
> Ended Job = job_201303051835_0020****
>
> MapReduce Jobs Launched:****
>
> Job 0: Map: 1  Reduce: 1   Cumulative CPU: 6.95 sec   HDFS Read: 184270926
> HDFS Write: 4 SUCCESS****
>
> Total MapReduce CPU Time Spent: 6 seconds 950 msec****
>
> OK****
>
> *986*
>
> Time taken: 35.983 seconds****
>
> *hive>*
>
> ** **
>
> and  *739169 – **738183=986*
>
> * *
>
> *can anyone tell me why this happened as BTTN_ID  ,  DATA_INST_ID,
> SCR_ID   are not null constrains of BTTN table and also composite Primary
> Key of the table?*
>
> *Also tell me how can I prevent this unnecessary data generation in HIVE