Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Data mismatch when importing data from Oracle to Hive through Sqoop without an error


Copy link to this message
-
RE: Data mismatch when importing data from Oracle to Hive through Sqoop without an error
Hi all,
I have notice one interesting thing in the below result-set.
I have fired one query in both Oracle and Hive shell and found the following result set:-->

SQL> select count(1) from bttn
  2  where bttn_id is null or data_inst_id is null or scr_id is null;

  COUNT(1)
----------
         0
hive> select count(1) from bttn
    > where bttn_id is null or data_inst_id is null or scr_id is null;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201303051835_0020, Tracking URL = http://NHCLT-PC44-2:50030/jobdetails.jsp?jobid=job_201303051835_0020
Kill Command = /home/hadoop/hadoop-1.0.3/bin/hadoop job  -kill job_201303051835_0020
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2013-03-06 13:22:56,908 Stage-1 map = 0%,  reduce = 0%
2013-03-06 13:23:05,928 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:06,931 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:07,934 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:08,938 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:09,941 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:10,944 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:11,947 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:12,956 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:13,959 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.2 sec
2013-03-06 13:23:14,962 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 5.2 sec
2013-03-06 13:23:15,965 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 5.2 sec
2013-03-06 13:23:16,969 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 5.2 sec
2013-03-06 13:23:17,974 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
2013-03-06 13:23:18,977 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
2013-03-06 13:23:19,981 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
2013-03-06 13:23:20,985 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
2013-03-06 13:23:21,988 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
2013-03-06 13:23:22,995 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
2013-03-06 13:23:23,998 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
MapReduce Total cumulative CPU time: 6 seconds 950 msec
Ended Job = job_201303051835_0020
MapReduce Jobs Launched:
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 6.95 sec   HDFS Read: 184270926 HDFS Write: 4 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 950 msec
OK
986
Time taken: 35.983 seconds
hive>

and  739169 - 738183=986

can anyone tell me why this happened as BTTN_ID  ,  DATA_INST_ID, SCR_ID   are not null constrains of BTTN table and also composite Primary Key of the table?
Also tell me how can I prevent this unnecessary data generation in HIVE table.

Regards
Ajit Kumar Shreevastava

From: Ajit Kumar Shreevastava
Sent: Wednesday, March 06, 2013 12:40 PM
To: '[EMAIL PROTECTED]'
Cc: [EMAIL PROTECTED]
Subject: Data mismatch when importing data from Oracle to Hive through Sqoop without an error

HI,

I have a table BTTN in Oracle database having 738183 records:-->

SQL> select count(1) from bttn;

  COUNT(1)
----------
    738183

Now I want to import this table to HIVE  and I have fired the following command at command prompt:-->

[hadoop@NHCLT-PC44-2 sqoop-oper]$ sqoop import --connect jdbc:oracle:thin:@10.99.42.11:1521/clouddb --username HDFSUSER  --table BTTN --verbose -P --hive-table bttn --create-hive-table --hive-import --hive-home /home/hadoop/user/hive/warehouse
The output is as follows:-->
Warning: /usr/lib/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
13/03/06 12:16:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
Enter password:
13/03/06 12:16:32 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
13/03/06 12:16:32 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
13/03/06 12:16:32 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory
13/03/06 12:16:32 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory
13/03/06 12:16:32 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:oracle:thin:@10.99.42.11
13/03/06 12:16:32 DEBUG manager.OracleManager$ConnCache: Instantiated new connection cache.
13/03/06 12:16:32 INFO manager.SqlManager: Using default fetchSize of 1000
13/03/06 12:16:32 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.OracleManager@2393385d<mailto:org.apache.sqoop.manager.OracleManager@2393385d>
13/03/06 12:16:32 INFO tool.CodeGenTool: Beginning code generation
13/03/06 12:16:32 DEBUG manager.OracleManager: Using column names query: SELECT t.* FROM BTTN t WHERE 1=0
13/03/06 12:16:32 DEBUG manager.OracleManager: Creating a new connection for jdbc:oracle:thin:@10.99.42.11:1521/clouddb, using username: HDFSUSER
13/03/06 12:16:32 DEBUG manager.OracleManager: No connection paramenters specified. Using regular API for making connection.
13/03/06 12:16:32 INFO manager.OracleManager: Time zone has been set to GMT
13/03/06 12:16:32 DEBUG manager.SqlManager: Using fetchSize for next query: 1000
13/03/06 12:16:32 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM BTTN t WHERE 1=0
13/03/06 12:16:32 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:@10.99.42.11:1521/clouddb/HDFSUSER
13/03/06 12:16:32 DEBUG orm.ClassWriter: selected colu