Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop, mail # user - Data mismatch when importing data from Oracle to Hive through Sqoop without an error


Copy link to this message
-
RE: Data mismatch when importing data from Oracle to Hive through Sqoop without an error
Ajit Kumar Shreevastava 2013-03-06, 08:35
Hi Abhijeet,

Thanks for your response.
If values that don't fit in double must be getting inserted as Null is the case then count should not be mis-match in both the case.
Here the null value inserted are extra value apart from the other value which is already present in both Oracle Table and Hive table.

Correct me if I am wrong in interpretation.

Thanks and Regards,
Ajit Kumar Shreevastava

From: abhijeet gaikwad [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, March 06, 2013 1:46 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an error

Sqoop maps numeric and decimal types (RDBMS) to double (Hive). I think the values that don't fit in double must be getting inserted as NULL.
You can see this warning in your logs.

Thanks,
Abhijeet
On Wed, Mar 6, 2013 at 1:32 PM, Ajit Kumar Shreevastava <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi all,
I have notice one interesting thing in the below result-set.
I have fired one query in both Oracle and Hive shell and found the following result set:-->

SQL> select count(1) from bttn
  2  where bttn_id is null or data_inst_id is null or scr_id is null;

  COUNT(1)
----------
         0
hive> select count(1) from bttn
    > where bttn_id is null or data_inst_id is null or scr_id is null;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201303051835_0020, Tracking URL = http://NHCLT-PC44-2:50030/jobdetails.jsp?jobid=job_201303051835_0020
Kill Command = /home/hadoop/hadoop-1.0.3/bin/hadoop job  -kill job_201303051835_0020
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2013-03-06 13:22:56,908 Stage-1 map = 0%,  reduce = 0%
2013-03-06 13:23:05,928 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:06,931 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:07,934 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:08,938 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:09,941 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:10,944 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:11,947 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:12,956 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:13,959 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:14,962 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 5.2 sec
2013-03-06 13:23:15,965 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 5.2 sec
2013-03-06 13:23:16,969 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 5.2 sec
2013-03-06 13:23:17,974 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
2013-03-06 13:23:18,977 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
2013-03-06 13:23:19,981 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
2013-03-06 13:23:20,985 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
2013-03-06 13:23:21,988 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
2013-03-06 13:23:22,995 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
2013-03-06 13:23:23,998 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
MapReduce Total cumulative CPU time: 6 seconds 950 msec
Ended Job = job_201303051835_0020
MapReduce Jobs Launched:
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 6.95 sec   HDFS Read: 184270926 HDFS Write: 4 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 950 msec
OK
986
Time taken: 35.983 seconds
hive>

and  739169 - 738183=986

can anyone tell me why this happened as BTTN_ID  ,  DATA_INST_ID, SCR_ID   are not null constrains of BTTN table and also composite Primary Key of the table?
Also tell me how can I prevent this unnecessary data generation in HIVE table.

Regards
Ajit Kumar Shreevastava

From: Ajit Kumar Shreevastava
Sent: Wednesday, March 06, 2013 12:40 PM
To: '[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>'
Cc: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Data mismatch when importing data from Oracle to Hive through Sqoop without an error

HI,

I have a table BTTN in Oracle database having 738183 records:-->

SQL> select count(1) from bttn;

  COUNT(1)
    738183

Now I want to import this table to HIVE  and I have fired the following command at command prompt:-->

[hadoop@NHCLT-PC44-2 sqoop-oper]$ sqoop import --connect jdbc:oracle:thin:@10.99.42.11:1521/clouddb<http://jdbc:oracle:thin:@10.99.42.11:1521/clouddb> --username HDFSUSER  --table BTTN --verbose -P --hive-table bttn --create-hive-table --hive-import --hive-home /home/hadoop/user/hive/warehouse
The output is as follows:-->
Warning: /usr/lib/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
13/03/06 12:16:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
Enter password:
13/03/06 12:16:32 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
13/03/06 12:16:32 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
13/03/06 12:16:32 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory
13/03/06 12:16:32 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory
13/03/06 12:16:32 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:oracle:thin:@10.99.42.11<mailto:jdbc%3Aoracle%3Athin%[EMAIL PROTECTED]>
13/03/06 12:16:32 DEBUG manager.OracleManager$ConnCache: Instantiated new connection cache.
13/03/06 12:16:32 INFO manager.SqlM