|
Ajit Kumar Shreevastava
2013-03-06, 07:09
Ajit Kumar Shreevastava
2013-03-06, 08:02
abhijeet gaikwad
2013-03-06, 08:15
Ajit Kumar Shreevastava
2013-03-06, 08:35
Venkat Ranganathan
2013-03-06, 16:06
abhijeet gaikwad
2013-03-06, 17:20
Jarek Jarcec Cecho
2013-03-06, 17:35
Ajit Kumar Shreevastava
2013-03-08, 13:38
Chalcy
2013-03-08, 17:45
Ajit Kumar Shreevastava
2013-03-13, 08:44
Jarek Jarcec Cecho
2013-03-16, 22:56
Chalcy
2013-03-17, 12:55
|
-
Data mismatch when importing data from Oracle to Hive through Sqoop without an errorAjit Kumar Shreevastava 2013-03-06, 07:09
HI,
I have a table BTTN in Oracle database having 738183 records:--> SQL> select count(1) from bttn; COUNT(1) ---------- 738183 Now I want to import this table to HIVE and I have fired the following command at command prompt:--> [hadoop@NHCLT-PC44-2 sqoop-oper]$ sqoop import --connect jdbc:oracle:thin:@10.99.42.11:1521/clouddb --username HDFSUSER --table BTTN --verbose -P --hive-table bttn --create-hive-table --hive-import --hive-home /home/hadoop/user/hive/warehouse The output is as follows:--> Warning: /usr/lib/hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. 13/03/06 12:16:25 DEBUG tool.BaseSqoopTool: Enabled debug logging. Enter password: 13/03/06 12:16:32 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 13/03/06 12:16:32 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 13/03/06 12:16:32 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory 13/03/06 12:16:32 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory 13/03/06 12:16:32 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:oracle:thin:@10.99.42.11 13/03/06 12:16:32 DEBUG manager.OracleManager$ConnCache: Instantiated new connection cache. 13/03/06 12:16:32 INFO manager.SqlManager: Using default fetchSize of 1000 13/03/06 12:16:32 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.OracleManager@2393385d 13/03/06 12:16:32 INFO tool.CodeGenTool: Beginning code generation 13/03/06 12:16:32 DEBUG manager.OracleManager: Using column names query: SELECT t.* FROM BTTN t WHERE 1=0 13/03/06 12:16:32 DEBUG manager.OracleManager: Creating a new connection for jdbc:oracle:thin:@10.99.42.11:1521/clouddb, using username: HDFSUSER 13/03/06 12:16:32 DEBUG manager.OracleManager: No connection paramenters specified. Using regular API for making connection. 13/03/06 12:16:32 INFO manager.OracleManager: Time zone has been set to GMT 13/03/06 12:16:32 DEBUG manager.SqlManager: Using fetchSize for next query: 1000 13/03/06 12:16:32 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM BTTN t WHERE 1=0 13/03/06 12:16:32 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:@10.99.42.11:1521/clouddb/HDFSUSER 13/03/06 12:16:32 DEBUG orm.ClassWriter: selected columns: 13/03/06 12:16:32 DEBUG orm.ClassWriter: BTTN_ID 13/03/06 12:16:32 DEBUG orm.ClassWriter: DATA_INST_ID 13/03/06 12:16:32 DEBUG orm.ClassWriter: SCR_ID 13/03/06 12:16:32 DEBUG orm.ClassWriter: BTTN_NU 13/03/06 12:16:32 DEBUG orm.ClassWriter: CAT 13/03/06 12:16:32 DEBUG orm.ClassWriter: WDTH 13/03/06 12:16:32 DEBUG orm.ClassWriter: HGHT 13/03/06 12:16:32 DEBUG orm.ClassWriter: KEY_SCAN 13/03/06 12:16:32 DEBUG orm.ClassWriter: KEY_SHFT 13/03/06 12:16:32 DEBUG orm.ClassWriter: FRGND_CPTN_COLR 13/03/06 12:16:32 DEBUG orm.ClassWriter: FRGND_CPTN_COLR_PRSD 13/03/06 12:16:32 DEBUG orm.ClassWriter: BKGD_CPTN_COLR 13/03/06 12:16:32 DEBUG orm.ClassWriter: BKGD_CPTN_COLR_PRSD 13/03/06 12:16:32 DEBUG orm.ClassWriter: BLM_FL 13/03/06 12:16:32 DEBUG orm.ClassWriter: LCLZ_FL 13/03/06 12:16:32 DEBUG orm.ClassWriter: MENU_ITEM_NU 13/03/06 12:16:32 DEBUG orm.ClassWriter: BTTN_ASGN_LVL_ID 13/03/06 12:16:32 DEBUG orm.ClassWriter: ON_ATVT 13/03/06 12:16:32 DEBUG orm.ClassWriter: ON_CLIK 13/03/06 12:16:32 DEBUG orm.ClassWriter: ENBL_FL 13/03/06 12:16:32 DEBUG orm.ClassWriter: BLM_SET_ID 13/03/06 12:16:32 DEBUG orm.ClassWriter: BTTN_ASGN_LVL_NAME 13/03/06 12:16:32 DEBUG orm.ClassWriter: MKT_ID 13/03/06 12:16:32 DEBUG orm.ClassWriter: CRTE_TS 13/03/06 12:16:32 DEBUG orm.ClassWriter: CRTE_USER_ID 13/03/06 12:16:32 DEBUG orm.ClassWriter: UPDT_TS 13/03/06 12:16:32 DEBUG orm.ClassWriter: UPDT_USER_ID 13/03/06 12:16:32 DEBUG orm.ClassWriter: DEL_TS 13/03/06 12:16:32 DEBUG orm.ClassWriter: DEL_USER_ID 13/03/06 12:16:32 DEBUG orm.ClassWriter: DLTD_FL 13/03/06 12:16:32 DEBUG orm.ClassWriter: MENU_ITEM_NA 13/03/06 12:16:32 DEBUG orm.ClassWriter: PRD_CD 13/03/06 12:16:32 DEBUG orm.ClassWriter: BLM_SET_NA 13/03/06 12:16:32 DEBUG orm.ClassWriter: SOUND_FILE_ID 13/03/06 12:16:32 DEBUG orm.ClassWriter: IS_DYNMC_BTTN 13/03/06 12:16:32 DEBUG orm.ClassWriter: FRGND_CPTN_COLR_ID 13/03/06 12:16:32 DEBUG orm.ClassWriter: FRGND_CPTN_COLR_PRSD_ID 13/03/06 12:16:32 DEBUG orm.ClassWriter: BKGD_CPTN_COLR_ID 13/03/06 12:16:32 DEBUG orm.ClassWriter: BKGD_CPTN_COLR_PRSD_ID 13/03/06 12:16:32 DEBUG orm.ClassWriter: Writing source file: /tmp/sqoop-hadoop/compile/6aa39c498a3c94ffa01a5352f368712e/BTTN.java 13/03/06 12:16:32 DEBUG orm.ClassWriter: Table name: BTTN 13/03/06 12:16:32 DEBUG orm.ClassWriter: Columns: BTTN_ID:2, DATA_INST_ID:2, SCR_ID:2, BTTN_NU:2, CAT:2, WDTH:2, HGHT:2, KEY_SCAN:2, KEY_SHFT:2, FRGND_CPTN_COLR:12, FRGND_CPTN_COLR_PRSD:12, BKGD_CPTN_COLR:12, BKGD_CPTN_COLR_PRSD:12, BLM_FL:2, LCLZ_FL:2, MENU_ITEM_NU:2, BTTN_ASGN_LVL_ID:2, ON_ATVT:2, ON_CLIK:2, ENBL_FL:2, BLM_SET_ID:2, BTTN_ASGN_LVL_NAME:12, MKT_ID:2, CRTE_TS:93, CRTE_USER_ID:12, UPDT_TS:93, UPDT_USER_ID:12, DEL_TS:93, DEL_USER_ID:12, DLTD_FL:2, MENU_ITEM_NA:12, PRD_CD:2, BLM_SET_NA:12, SOUND_FILE_ID:2, IS_DYNMC_BTTN:2, FRGND_CPTN_COLR_ID:2, FRGND_CPTN_COLR_PRSD_ID:2, BKGD_CPTN_COLR_ID:2, BKGD_CPTN_COLR_PRSD_ID:2, 13/03/06 12:16:32 DEBUG orm.ClassWriter: sourceFilename is BTTN.java 13/03/06 12:16:32 DEBUG orm.CompilationManager: Found existing /tmp/sqoop-hadoop/compile/6aa39c498a3c94ffa01a5352f368712e/ 13/03/06 12:16:32 INFO orm.CompilationManager: HADOOP_HOME is /home/hadoop/hadoop-1.0.3/libexec/.. 13/03/06 12:16:32 DEBUG orm.CompilationManager: Adding source file: /tmp/sqoop-hadoop/compile/6aa39c498a3c94ffa01a5352f368712e/BTTN.java 13/03/06 12:16:32 DEBUG orm.CompilationManager: Invoking javac with args: 13/03/06 12:16:32 DEBUG orm.CompilationManager: -sourcepath 13/03/06 12:16:32 DEBUG orm.Compilat
-
RE: Data mismatch when importing data from Oracle to Hive through Sqoop without an errorAjit Kumar Shreevastava 2013-03-06, 08:02
Hi all,
I have notice one interesting thing in the below result-set. I have fired one query in both Oracle and Hive shell and found the following result set:--> SQL> select count(1) from bttn 2 where bttn_id is null or data_inst_id is null or scr_id is null; COUNT(1) ---------- 0 hive> select count(1) from bttn > where bttn_id is null or data_inst_id is null or scr_id is null; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201303051835_0020, Tracking URL = http://NHCLT-PC44-2:50030/jobdetails.jsp?jobid=job_201303051835_0020 Kill Command = /home/hadoop/hadoop-1.0.3/bin/hadoop job -kill job_201303051835_0020 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-03-06 13:22:56,908 Stage-1 map = 0%, reduce = 0% 2013-03-06 13:23:05,928 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:06,931 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:07,934 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:08,938 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:09,941 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:10,944 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:11,947 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:12,956 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:13,959 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:14,962 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 5.2 sec 2013-03-06 13:23:15,965 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 5.2 sec 2013-03-06 13:23:16,969 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 5.2 sec 2013-03-06 13:23:17,974 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:18,977 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:19,981 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:20,985 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:21,988 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:22,995 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:23,998 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec MapReduce Total cumulative CPU time: 6 seconds 950 msec Ended Job = job_201303051835_0020 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 6.95 sec HDFS Read: 184270926 HDFS Write: 4 SUCCESS Total MapReduce CPU Time Spent: 6 seconds 950 msec OK 986 Time taken: 35.983 seconds hive> and 739169 - 738183=986 can anyone tell me why this happened as BTTN_ID , DATA_INST_ID, SCR_ID are not null constrains of BTTN table and also composite Primary Key of the table? Also tell me how can I prevent this unnecessary data generation in HIVE table. Regards Ajit Kumar Shreevastava From: Ajit Kumar Shreevastava Sent: Wednesday, March 06, 2013 12:40 PM To: '[EMAIL PROTECTED]' Cc: [EMAIL PROTECTED] Subject: Data mismatch when importing data from Oracle to Hive through Sqoop without an error HI, I have a table BTTN in Oracle database having 738183 records:--> SQL> select count(1) from bttn; COUNT(1) ---------- 738183 Now I want to import this table to HIVE and I have fired the following command at command prompt:--> [hadoop@NHCLT-PC44-2 sqoop-oper]$ sqoop import --connect jdbc:oracle:thin:@10.99.42.11:1521/clouddb --username HDFSUSER --table BTTN --verbose -P --hive-table bttn --create-hive-table --hive-import --hive-home /home/hadoop/user/hive/warehouse The output is as follows:--> Warning: /usr/lib/hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. 13/03/06 12:16:25 DEBUG tool.BaseSqoopTool: Enabled debug logging. Enter password: 13/03/06 12:16:32 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 13/03/06 12:16:32 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 13/03/06 12:16:32 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory 13/03/06 12:16:32 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory 13/03/06 12:16:32 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:oracle:thin:@10.99.42.11 13/03/06 12:16:32 DEBUG manager.OracleManager$ConnCache: Instantiated new connection cache. 13/03/06 12:16:32 INFO manager.SqlManager: Using default fetchSize of 1000 13/03/06 12:16:32 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.OracleManager@2393385d<mailto:org.apache.sqoop.manager.OracleManager@2393385d> 13/03/06 12:16:32 INFO tool.CodeGenTool: Beginning code generation 13/03/06 12:16:32 DEBUG manager.OracleManager: Using column names query: SELECT t.* FROM BTTN t WHERE 1=0 13/03/06 12:16:32 DEBUG manager.OracleManager: Creating a new connection for jdbc:oracle:thin:@10.99.42.11:1521/clouddb, using username: HDFSUSER 13/03/06 12:16:32 DEBUG manager.OracleManager: No connection paramenters specified. Using regular API for making connection. 13/03/06 12:16:32 INFO manager.OracleManager: Time zone has been set to GMT 13/03/06 12:16:32 DEBUG manager.SqlManager: Using fetchSize for next query: 1000 13/03/06 12:16:32 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM BTTN t WHERE 1=0 13/03/06 12:16:32 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:@10.99.42.11:1521/clouddb/HDFSUSER 13/03/06 12:16:32 DEBUG orm.ClassWriter: selected colu
-
Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an errorabhijeet gaikwad 2013-03-06, 08:15
Sqoop maps numeric and decimal types (RDBMS) to double (Hive). I think the
values that don't fit in double must be getting inserted as NULL. You can see this warning in your logs. Thanks, Abhijeet On Wed, Mar 6, 2013 at 1:32 PM, Ajit Kumar Shreevastava < [EMAIL PROTECTED]> wrote: > Hi all,**** > > I have notice one interesting thing in the below result-set.**** > > I have fired one query in both Oracle and Hive shell and found the > following result set:à**** > > ** ** > > *SQL> select count(1) from bttn* > > * 2 where bttn_id is null or data_inst_id is null or scr_id is null;* > > * * > > * COUNT(1)* > > *----------* > > * 0* > > *hive> select count(1) from bttn* > > * > where bttn_id is null or data_inst_id is null or scr_id is null;* > > Total MapReduce jobs = 1**** > > Launching Job 1 out of 1**** > > Number of reduce tasks determined at compile time: 1**** > > In order to change the average load for a reducer (in bytes):**** > > set hive.exec.reducers.bytes.per.reducer=<number>**** > > In order to limit the maximum number of reducers:**** > > set hive.exec.reducers.max=<number>**** > > In order to set a constant number of reducers:**** > > set mapred.reduce.tasks=<number>**** > > Starting Job = job_201303051835_0020, Tracking URL > http://NHCLT-PC44-2:50030/jobdetails.jsp?jobid=job_201303051835_0020**** > > Kill Command = /home/hadoop/hadoop-1.0.3/bin/hadoop job -kill > job_201303051835_0020**** > > Hadoop job information for Stage-1: number of mappers: 1; number of > reducers: 1**** > > 2013-03-06 13:22:56,908 Stage-1 map = 0%, reduce = 0%**** > > 2013-03-06 13:23:05,928 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:06,931 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:07,934 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:08,938 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:09,941 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:10,944 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:11,947 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:12,956 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:13,959 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:14,962 Stage-1 map = 100%, reduce = 33%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:15,965 Stage-1 map = 100%, reduce = 33%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:16,969 Stage-1 map = 100%, reduce = 33%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:17,974 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 6.95 sec**** > > 2013-03-06 13:23:18,977 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 6.95 sec**** > > 2013-03-06 13:23:19,981 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 6.95 sec**** > > 2013-03-06 13:23:20,985 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 6.95 sec**** > > 2013-03-06 13:23:21,988 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 6.95 sec**** > > 2013-03-06 13:23:22,995 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 6.95 sec**** > > 2013-03-06 13:23:23,998 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 6.95 sec**** > > MapReduce Total cumulative CPU time: 6 seconds 950 msec**** > > Ended Job = job_201303051835_0020**** > > MapReduce Jobs Launched:**** > > Job 0: Map: 1 Reduce: 1 Cumulative CPU: 6.95 sec HDFS Read: 184270926 > HDFS Write: 4 SUCCESS**** > > Total MapReduce CPU Time Spent: 6 seconds 950 msec**** > > OK**** > > *986* > > Time taken: 35.983 seconds**** > > *hive>* > > ** ** > > and *739169 – **738183=986* > > * * > > *can anyone tell me why this happened as BTTN_ID , DATA_INST_ID, > SCR_ID are not null constrains of BTTN table and also composite Primary > Key of the table?* > > *Also tell me how can I prevent this unnecessary data generation in HIVE
-
RE: Data mismatch when importing data from Oracle to Hive through Sqoop without an errorAjit Kumar Shreevastava 2013-03-06, 08:35
Hi Abhijeet,
Thanks for your response. If values that don't fit in double must be getting inserted as Null is the case then count should not be mis-match in both the case. Here the null value inserted are extra value apart from the other value which is already present in both Oracle Table and Hive table. Correct me if I am wrong in interpretation. Thanks and Regards, Ajit Kumar Shreevastava From: abhijeet gaikwad [mailto:[EMAIL PROTECTED]] Sent: Wednesday, March 06, 2013 1:46 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an error Sqoop maps numeric and decimal types (RDBMS) to double (Hive). I think the values that don't fit in double must be getting inserted as NULL. You can see this warning in your logs. Thanks, Abhijeet On Wed, Mar 6, 2013 at 1:32 PM, Ajit Kumar Shreevastava <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi all, I have notice one interesting thing in the below result-set. I have fired one query in both Oracle and Hive shell and found the following result set:--> SQL> select count(1) from bttn 2 where bttn_id is null or data_inst_id is null or scr_id is null; COUNT(1) ---------- 0 hive> select count(1) from bttn > where bttn_id is null or data_inst_id is null or scr_id is null; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201303051835_0020, Tracking URL = http://NHCLT-PC44-2:50030/jobdetails.jsp?jobid=job_201303051835_0020 Kill Command = /home/hadoop/hadoop-1.0.3/bin/hadoop job -kill job_201303051835_0020 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-03-06 13:22:56,908 Stage-1 map = 0%, reduce = 0% 2013-03-06 13:23:05,928 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:06,931 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:07,934 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:08,938 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:09,941 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:10,944 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:11,947 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:12,956 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:13,959 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:14,962 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 5.2 sec 2013-03-06 13:23:15,965 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 5.2 sec 2013-03-06 13:23:16,969 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 5.2 sec 2013-03-06 13:23:17,974 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:18,977 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:19,981 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:20,985 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:21,988 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:22,995 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:23,998 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec MapReduce Total cumulative CPU time: 6 seconds 950 msec Ended Job = job_201303051835_0020 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 6.95 sec HDFS Read: 184270926 HDFS Write: 4 SUCCESS Total MapReduce CPU Time Spent: 6 seconds 950 msec OK 986 Time taken: 35.983 seconds hive> and 739169 - 738183=986 can anyone tell me why this happened as BTTN_ID , DATA_INST_ID, SCR_ID are not null constrains of BTTN table and also composite Primary Key of the table? Also tell me how can I prevent this unnecessary data generation in HIVE table. Regards Ajit Kumar Shreevastava From: Ajit Kumar Shreevastava Sent: Wednesday, March 06, 2013 12:40 PM To: '[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>' Cc: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Data mismatch when importing data from Oracle to Hive through Sqoop without an error HI, I have a table BTTN in Oracle database having 738183 records:--> SQL> select count(1) from bttn; COUNT(1) 738183 Now I want to import this table to HIVE and I have fired the following command at command prompt:--> [hadoop@NHCLT-PC44-2 sqoop-oper]$ sqoop import --connect jdbc:oracle:thin:@10.99.42.11:1521/clouddb<http://jdbc:oracle:thin:@10.99.42.11:1521/clouddb> --username HDFSUSER --table BTTN --verbose -P --hive-table bttn --create-hive-table --hive-import --hive-home /home/hadoop/user/hive/warehouse The output is as follows:--> Warning: /usr/lib/hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. 13/03/06 12:16:25 DEBUG tool.BaseSqoopTool: Enabled debug logging. Enter password: 13/03/06 12:16:32 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 13/03/06 12:16:32 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 13/03/06 12:16:32 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory 13/03/06 12:16:32 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory 13/03/06 12:16:32 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:oracle:thin:@10.99.42.11<mailto:jdbc%3Aoracle%3Athin%[EMAIL PROTECTED]> 13/03/06 12:16:32 DEBUG manager.OracleManager$ConnCache: Instantiated new connection cache. 13/03/06 12:16:32 INFO manager.SqlM
-
Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an errorVenkat Ranganathan 2013-03-06, 16:06
Hi Ajit
Do you know if rest of the columns also are null when the three non null columns are null Venkat On Wed, Mar 6, 2013 at 12:35 AM, Ajit Kumar Shreevastava <[EMAIL PROTECTED]> wrote: > Hi Abhijeet, > > > > Thanks for your response. > > If values that don’t fit in double must be getting inserted as Null is the > case then count should not be mis-match in both the case. > > Here the null value inserted are extra value apart from the other value > which is already present in both Oracle Table and Hive table. > > > > Correct me if I am wrong in interpretation. > > > > Thanks and Regards, > > Ajit Kumar Shreevastava > > > > From: abhijeet gaikwad [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, March 06, 2013 1:46 PM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: Re: Data mismatch when importing data from Oracle to Hive through > Sqoop without an error > > > > Sqoop maps numeric and decimal types (RDBMS) to double (Hive). I think the > values that don't fit in double must be getting inserted as NULL. > You can see this warning in your logs. > > Thanks, > Abhijeet > > On Wed, Mar 6, 2013 at 1:32 PM, Ajit Kumar Shreevastava > <[EMAIL PROTECTED]> wrote: > > Hi all, > > I have notice one interesting thing in the below result-set. > > I have fired one query in both Oracle and Hive shell and found the following > result set:à > > > > SQL> select count(1) from bttn > > 2 where bttn_id is null or data_inst_id is null or scr_id is null; > > > > COUNT(1) > > ---------- > > 0 > > hive> select count(1) from bttn > > > where bttn_id is null or data_inst_id is null or scr_id is null; > > Total MapReduce jobs = 1 > > Launching Job 1 out of 1 > > Number of reduce tasks determined at compile time: 1 > > In order to change the average load for a reducer (in bytes): > > set hive.exec.reducers.bytes.per.reducer=<number> > > In order to limit the maximum number of reducers: > > set hive.exec.reducers.max=<number> > > In order to set a constant number of reducers: > > set mapred.reduce.tasks=<number> > > Starting Job = job_201303051835_0020, Tracking URL > http://NHCLT-PC44-2:50030/jobdetails.jsp?jobid=job_201303051835_0020 > > Kill Command = /home/hadoop/hadoop-1.0.3/bin/hadoop job -kill > job_201303051835_0020 > > Hadoop job information for Stage-1: number of mappers: 1; number of > reducers: 1 > > 2013-03-06 13:22:56,908 Stage-1 map = 0%, reduce = 0% > > 2013-03-06 13:23:05,928 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 > sec > > 2013-03-06 13:23:06,931 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 > sec > > 2013-03-06 13:23:07,934 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 > sec > > 2013-03-06 13:23:08,938 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 > sec > > 2013-03-06 13:23:09,941 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 > sec > > 2013-03-06 13:23:10,944 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 > sec > > 2013-03-06 13:23:11,947 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 > sec > > 2013-03-06 13:23:12,956 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 > sec > > 2013-03-06 13:23:13,959 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 > sec > > 2013-03-06 13:23:14,962 Stage-1 map = 100%, reduce = 33%, Cumulative CPU > 5.2 sec > > 2013-03-06 13:23:15,965 Stage-1 map = 100%, reduce = 33%, Cumulative CPU > 5.2 sec > > 2013-03-06 13:23:16,969 Stage-1 map = 100%, reduce = 33%, Cumulative CPU > 5.2 sec > > 2013-03-06 13:23:17,974 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 6.95 sec > > 2013-03-06 13:23:18,977 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 6.95 sec > > 2013-03-06 13:23:19,981 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 6.95 sec > > 2013-03-06 13:23:20,985 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 6.95 sec > > 2013-03-06 13:23:21,988 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 6.95 sec > > 2013-03-06 13:23:22,995 Stage-1 map = 100%, reduce = 100%, Cumulative CPU
-
Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an errorabhijeet gaikwad 2013-03-06, 17:20
If you see Hadoop job counters in your logs - numbers are correct till that
point. So normal import in HDFS is working fine. Only reasonable explanation is that there is an issue inserting data in Hive. Have a look at the file on HDFS it should contain the correct number of records with correct data. But if you do "select * ..." from Hive console it will give you NULL values for invalid data, as in the data that cannot be converted to the respective data type of the column of Hive table. This is because Hive imposes data type constraints on the data in files on HDFS. Those extra records is a mystry for me too. I don't think actual file on HDFS has extra records. Sqoop is creating Hive table incorrectly which **may** have caused this issue. Create table manually with correct data types if equivalent available or as string and try loading data. If this works fine we know where the issue is! Thanks, Abhijeet On Wed, Mar 6, 2013 at 2:05 PM, Ajit Kumar Shreevastava < [EMAIL PROTECTED]> wrote: > Hi Abhijeet,**** > > ** ** > > Thanks for your response.**** > > If values that don’t fit in double must be getting inserted as Null is the > case then count should not be mis-match in both the case.**** > > Here the null value inserted are extra value apart from the other value > which is already present in both Oracle Table and Hive table.**** > > ** ** > > Correct me if I am wrong in interpretation.**** > > ** ** > > Thanks and Regards,**** > > Ajit Kumar Shreevastava**** > > ** ** > > *From:* abhijeet gaikwad [mailto:[EMAIL PROTECTED]] > *Sent:* Wednesday, March 06, 2013 1:46 PM > *To:* [EMAIL PROTECTED] > *Cc:* [EMAIL PROTECTED] > *Subject:* Re: Data mismatch when importing data from Oracle to Hive > through Sqoop without an error**** > > ** ** > > Sqoop maps numeric and decimal types (RDBMS) to double (Hive). I think the > values that don't fit in double must be getting inserted as NULL. > You can see this warning in your logs. > > Thanks, > Abhijeet**** > > On Wed, Mar 6, 2013 at 1:32 PM, Ajit Kumar Shreevastava < > [EMAIL PROTECTED]> wrote:**** > > Hi all,**** > > I have notice one interesting thing in the below result-set.**** > > I have fired one query in both Oracle and Hive shell and found the > following result set:à**** > > **** > > *SQL> select count(1) from bttn***** > > * 2 where bttn_id is null or data_inst_id is null or scr_id is null;**** > * > > * ***** > > * COUNT(1)***** > > *----------***** > > * 0***** > > *hive> select count(1) from bttn***** > > * > where bttn_id is null or data_inst_id is null or scr_id is null;*** > ** > > Total MapReduce jobs = 1**** > > Launching Job 1 out of 1**** > > Number of reduce tasks determined at compile time: 1**** > > In order to change the average load for a reducer (in bytes):**** > > set hive.exec.reducers.bytes.per.reducer=<number>**** > > In order to limit the maximum number of reducers:**** > > set hive.exec.reducers.max=<number>**** > > In order to set a constant number of reducers:**** > > set mapred.reduce.tasks=<number>**** > > Starting Job = job_201303051835_0020, Tracking URL > http://NHCLT-PC44-2:50030/jobdetails.jsp?jobid=job_201303051835_0020<http://nhclt-pc44-2:50030/jobdetails.jsp?jobid=job_201303051835_0020> > **** > > Kill Command = /home/hadoop/hadoop-1.0.3/bin/hadoop job -kill > job_201303051835_0020**** > > Hadoop job information for Stage-1: number of mappers: 1; number of > reducers: 1**** > > 2013-03-06 13:22:56,908 Stage-1 map = 0%, reduce = 0%**** > > 2013-03-06 13:23:05,928 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:06,931 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:07,934 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:08,938 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:09,941 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 5.2 sec**** > > 2013-03-06 13:23:10,944 Stage-1 map = 100%, reduce = 0%, Cumulative CPU
-
Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an errorJarek Jarcec Cecho 2013-03-06, 17:35
Hi Ajit,
I've seen similar issue many times. Does your table have textual data? If so, can it happen that your textual data contains hive delimiters like new line characters? Because if so then Sqoop might create two lines in for one single row in the table that will be consequently seen as two rows in Hive. As Hive will implicitly convert any invalid values into NULL, it would also explain your NULL values. Sqoop offers arguments --hive-drop-import-delims and --hive-delims-replacement to deal with this problem. More information can be found in Sqoop user guide [1]. Jarcec Links: 1: http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_importing_data_into_hive On Wed, Mar 06, 2013 at 10:50:26PM +0530, abhijeet gaikwad wrote: > If you see Hadoop job counters in your logs - numbers are correct till that > point. So normal import in HDFS is working fine. Only reasonable > explanation is that there is an issue inserting data in Hive. > > Have a look at the file on HDFS it should contain the correct number of > records with correct data. But if you do "select * ..." from Hive console > it will give you NULL values for invalid data, as in the data that cannot > be converted to the respective data type of the column of Hive table. This > is because Hive imposes data type constraints on the data in files on HDFS. > Those extra records is a mystry for me too. I don't think actual file on > HDFS has extra records. > > Sqoop is creating Hive table incorrectly which **may** have caused this > issue. Create table manually with correct data types if equivalent > available or as string and try loading data. If this works fine we know > where the issue is! > > Thanks, > Abhijeet > On Wed, Mar 6, 2013 at 2:05 PM, Ajit Kumar Shreevastava < > [EMAIL PROTECTED]> wrote: > > > > Hi Abhijeet,**** > > > > ** ** > > > > Thanks for your response.**** > > > > If values that don’t fit in double must be getting inserted as Null is the > > case then count should not be mis-match in both the case.**** > > > > Here the null value inserted are extra value apart from the other value > > which is already present in both Oracle Table and Hive table.**** > > > > ** ** > > > > Correct me if I am wrong in interpretation.**** > > > > ** ** > > > > Thanks and Regards,**** > > > > Ajit Kumar Shreevastava**** > > > > ** ** > > > > *From:* abhijeet gaikwad [mailto:[EMAIL PROTECTED]] > > *Sent:* Wednesday, March 06, 2013 1:46 PM > > *To:* [EMAIL PROTECTED] > > *Cc:* [EMAIL PROTECTED] > > *Subject:* Re: Data mismatch when importing data from Oracle to Hive > > through Sqoop without an error**** > > > > ** ** > > > > Sqoop maps numeric and decimal types (RDBMS) to double (Hive). I think the > > values that don't fit in double must be getting inserted as NULL. > > You can see this warning in your logs. > > > > Thanks, > > Abhijeet**** > > > > On Wed, Mar 6, 2013 at 1:32 PM, Ajit Kumar Shreevastava < > > [EMAIL PROTECTED]> wrote:**** > > > > Hi all,**** > > > > I have notice one interesting thing in the below result-set.**** > > > > I have fired one query in both Oracle and Hive shell and found the > > following result set:à**** > > > > **** > > > > *SQL> select count(1) from bttn***** > > > > * 2 where bttn_id is null or data_inst_id is null or scr_id is null;**** > > * > > > > * ***** > > > > * COUNT(1)***** > > > > *----------***** > > > > * 0***** > > > > *hive> select count(1) from bttn***** > > > > * > where bttn_id is null or data_inst_id is null or scr_id is null;*** > > ** > > > > Total MapReduce jobs = 1**** > > > > Launching Job 1 out of 1**** > > > > Number of reduce tasks determined at compile time: 1**** > > > > In order to change the average load for a reducer (in bytes):**** > > > > set hive.exec.reducers.bytes.per.reducer=<number>**** > > > > In order to limit the maximum number of reducers:**** > > > > set hive.exec.reducers.max=<number>**** > > > > In order to set a constant number of reducers:****
-
RE: Data mismatch when importing data from Oracle to Hive through Sqoop without an errorAjit Kumar Shreevastava 2013-03-08, 13:38
Hi Venkat,
All most column have some value except these three. Regards, Ajit -----Original Message----- From: Venkat Ranganathan [mailto:[EMAIL PROTECTED]] Sent: Wednesday, March 06, 2013 9:36 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an error Hi Ajit Do you know if rest of the columns also are null when the three non null columns are null Venkat On Wed, Mar 6, 2013 at 12:35 AM, Ajit Kumar Shreevastava <[EMAIL PROTECTED]> wrote: > Hi Abhijeet, > > > > Thanks for your response. > > If values that don't fit in double must be getting inserted as Null is > the case then count should not be mis-match in both the case. > > Here the null value inserted are extra value apart from the other > value which is already present in both Oracle Table and Hive table. > > > > Correct me if I am wrong in interpretation. > > > > Thanks and Regards, > > Ajit Kumar Shreevastava > > > > From: abhijeet gaikwad [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, March 06, 2013 1:46 PM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: Re: Data mismatch when importing data from Oracle to Hive > through Sqoop without an error > > > > Sqoop maps numeric and decimal types (RDBMS) to double (Hive). I think > the values that don't fit in double must be getting inserted as NULL. > You can see this warning in your logs. > > Thanks, > Abhijeet > > On Wed, Mar 6, 2013 at 1:32 PM, Ajit Kumar Shreevastava > <[EMAIL PROTECTED]> wrote: > > Hi all, > > I have notice one interesting thing in the below result-set. > > I have fired one query in both Oracle and Hive shell and found the > following result set:à > > > > SQL> select count(1) from bttn > > 2 where bttn_id is null or data_inst_id is null or scr_id is null; > > > > COUNT(1) > > ---------- > > 0 > > hive> select count(1) from bttn > > > where bttn_id is null or data_inst_id is null or scr_id is null; > > Total MapReduce jobs = 1 > > Launching Job 1 out of 1 > > Number of reduce tasks determined at compile time: 1 > > In order to change the average load for a reducer (in bytes): > > set hive.exec.reducers.bytes.per.reducer=<number> > > In order to limit the maximum number of reducers: > > set hive.exec.reducers.max=<number> > > In order to set a constant number of reducers: > > set mapred.reduce.tasks=<number> > > Starting Job = job_201303051835_0020, Tracking URL > http://NHCLT-PC44-2:50030/jobdetails.jsp?jobid=job_201303051835_0020 > > Kill Command = /home/hadoop/hadoop-1.0.3/bin/hadoop job -kill > job_201303051835_0020 > > Hadoop job information for Stage-1: number of mappers: 1; number of > reducers: 1 > > 2013-03-06 13:22:56,908 Stage-1 map = 0%, reduce = 0% > > 2013-03-06 13:23:05,928 Stage-1 map = 100%, reduce = 0%, Cumulative > CPU 5.2 sec > > 2013-03-06 13:23:06,931 Stage-1 map = 100%, reduce = 0%, Cumulative > CPU 5.2 sec > > 2013-03-06 13:23:07,934 Stage-1 map = 100%, reduce = 0%, Cumulative > CPU 5.2 sec > > 2013-03-06 13:23:08,938 Stage-1 map = 100%, reduce = 0%, Cumulative > CPU 5.2 sec > > 2013-03-06 13:23:09,941 Stage-1 map = 100%, reduce = 0%, Cumulative > CPU 5.2 sec > > 2013-03-06 13:23:10,944 Stage-1 map = 100%, reduce = 0%, Cumulative > CPU 5.2 sec > > 2013-03-06 13:23:11,947 Stage-1 map = 100%, reduce = 0%, Cumulative > CPU 5.2 sec > > 2013-03-06 13:23:12,956 Stage-1 map = 100%, reduce = 0%, Cumulative > CPU 5.2 sec > > 2013-03-06 13:23:13,959 Stage-1 map = 100%, reduce = 0%, Cumulative > CPU 5.2 sec > > 2013-03-06 13:23:14,962 Stage-1 map = 100%, reduce = 33%, Cumulative > CPU > 5.2 sec > > 2013-03-06 13:23:15,965 Stage-1 map = 100%, reduce = 33%, Cumulative > CPU > 5.2 sec > > 2013-03-06 13:23:16,969 Stage-1 map = 100%, reduce = 33%, Cumulative > CPU > 5.2 sec > > 2013-03-06 13:23:17,974 Stage-1 map = 100%, reduce = 100%, Cumulative > CPU > 6.95 sec > > 2013-03-06 13:23:18,977 Stage-1 map = 100%, reduce = 100%, Cumulative
-
Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an errorChalcy 2013-03-08, 17:45
Yes, This is newline char in one of the text fields.
The data is split and forms two rows if the original field has one new line char. -hive-drop-import-delims is the helper. Thanks, Chalcy On Wed, Mar 6, 2013 at 12:35 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > Hi Ajit, > I've seen similar issue many times. Does your table have textual data? If > so, can it happen that your textual data contains hive delimiters like new > line characters? Because if so then Sqoop might create two lines in for one > single row in the table that will be consequently seen as two rows in Hive. > As Hive will implicitly convert any invalid values into NULL, it would also > explain your NULL values. > > Sqoop offers arguments --hive-drop-import-delims and > --hive-delims-replacement to deal with this problem. More information can > be found in Sqoop user guide [1]. > > Jarcec > > Links: > 1: > http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_importing_data_into_hive > > On Wed, Mar 06, 2013 at 10:50:26PM +0530, abhijeet gaikwad wrote: > > If you see Hadoop job counters in your logs - numbers are correct till > that > > point. So normal import in HDFS is working fine. Only reasonable > > explanation is that there is an issue inserting data in Hive. > > > > Have a look at the file on HDFS it should contain the correct number of > > records with correct data. But if you do "select * ..." from Hive console > > it will give you NULL values for invalid data, as in the data that cannot > > be converted to the respective data type of the column of Hive table. > This > > is because Hive imposes data type constraints on the data in files on > HDFS. > > Those extra records is a mystry for me too. I don't think actual file on > > HDFS has extra records. > > > > Sqoop is creating Hive table incorrectly which **may** have caused this > > issue. Create table manually with correct data types if equivalent > > available or as string and try loading data. If this works fine we know > > where the issue is! > > > > Thanks, > > Abhijeet > > On Wed, Mar 6, 2013 at 2:05 PM, Ajit Kumar Shreevastava < > > [EMAIL PROTECTED]> wrote: > > > > > > > Hi Abhijeet,**** > > > > > > ** ** > > > > > > Thanks for your response.**** > > > > > > If values that don’t fit in double must be getting inserted as Null is > the > > > case then count should not be mis-match in both the case.**** > > > > > > Here the null value inserted are extra value apart from the other value > > > which is already present in both Oracle Table and Hive table.**** > > > > > > ** ** > > > > > > Correct me if I am wrong in interpretation.**** > > > > > > ** ** > > > > > > Thanks and Regards,**** > > > > > > Ajit Kumar Shreevastava**** > > > > > > ** ** > > > > > > *From:* abhijeet gaikwad [mailto:[EMAIL PROTECTED]] > > > *Sent:* Wednesday, March 06, 2013 1:46 PM > > > *To:* [EMAIL PROTECTED] > > > *Cc:* [EMAIL PROTECTED] > > > *Subject:* Re: Data mismatch when importing data from Oracle to Hive > > > through Sqoop without an error**** > > > > > > ** ** > > > > > > Sqoop maps numeric and decimal types (RDBMS) to double (Hive). I think > the > > > values that don't fit in double must be getting inserted as NULL. > > > You can see this warning in your logs. > > > > > > Thanks, > > > Abhijeet**** > > > > > > On Wed, Mar 6, 2013 at 1:32 PM, Ajit Kumar Shreevastava < > > > [EMAIL PROTECTED]> wrote:**** > > > > > > Hi all,**** > > > > > > I have notice one interesting thing in the below result-set.**** > > > > > > I have fired one query in both Oracle and Hive shell and found the > > > following result set:à**** > > > > > > **** > > > > > > *SQL> select count(1) from bttn***** > > > > > > * 2 where bttn_id is null or data_inst_id is null or scr_id is > null;**** > > > * > > > > > > * ***** > > > > > > * COUNT(1)***** > > > > > > *----------***** > > > > > > * 0***** > > > > > > *hive> select count(1) from bttn***** > > > > > > * > where bttn_id is null or data_inst_id is null or scr_id is
-
RE: Data mismatch when importing data from Oracle to Hive through Sqoop without an errorAjit Kumar Shreevastava 2013-03-13, 08:44
Thanks Chalcy and Jarek
Now, I am facing one new problem with Sqoop export utility. I have implemented what both of you suggested in chain mail and my problem is solved for importing the data to Hive. Now I have truncate the table in oracle and try to export the data from just populated table in Hive is working fine and no record mismatch was found. Command for export to oracle is :--> [hadoop@NHCLT-PC44-2 sqoop-oper]$ sqoop export --connect jdbc:oracle:thin:@10.99.42.11:1521/clouddb --username HDFSUSER --table BTTN_BKP_TEST --export-dir /home/hadoop/user/hive/warehouse/bttn_bkp_test -P -m 1 --input-fields-terminated-by '\0001' -verbose Now I want to create a table in hive as a copy of bttn_bkp_test. I described the bttn_bkp_test table and created the table column described in bttn_bkp_test like :--> hive> CREATE TABLE IF NOT EXISTS BTTN_BKP( > bttn_id double, > data_inst_id double, > scr_id double, > bttn_nu double, > cat double, > wdth double, > hght double, > key_scan double, > key_shft double, > frgnd_cptn_colr string, > frgnd_cptn_colr_prsd string, > bkgd_cptn_colr string, > bkgd_cptn_colr_prsd string, > blm_fl double, > lclz_fl double, > menu_item_nu double, > bttn_asgn_lvl_id double, > on_atvt double, > on_clik double, > enbl_fl double, > blm_set_id double, > bttn_asgn_lvl_name string, > mkt_id double, > crte_ts string, > crte_user_id string, > updt_ts string, > updt_user_id string, > del_ts string, > del_user_id string, > dltd_fl double, > menu_item_na string, > prd_cd double, > blm_set_na string, > sound_file_id double, > is_dynmc_bttn double, > frgnd_cptn_colr_id double, > frgnd_cptn_colr_prsd_id double, > bkgd_cptn_colr_id double, > bkgd_cptn_colr_prsd_id double); Then I populate this table by following command:--> hive> insert OVERWRITE table bttn_bkp > select * from BTTN_BKP_TEST; And its insert the same data in bttn_bkp without any error. Now I have truncated the oracle table bttn_bkp_test and try to populate the table by bttn_bkp table data of Hive through below Sqoop command :--> [hadoop@NHCLT-PC44-2 sqoop-oper]$ sqoop export --connect jdbc:oracle:thin:@10.99.42.11:1521/clouddb --username HDFSUSER --table BTTN_BKP_TEST --export-dir /home/hadoop/user/hive/warehouse/bttn_bkp_test -P -m 1 --input-fields-terminated-by '\0001' -verbose And I got below error error:--> Warning: /usr/lib/hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. 13/03/13 14:00:04 DEBUG tool.BaseSqoopTool: Enabled debug logging. Enter password: 13/03/13 14:00:09 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory 13/03/13 14:00:09 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory 13/03/13 14:00:09 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:oracle:thin:@10.99.42.11 13/03/13 14:00:09 DEBUG manager.OracleManager$ConnCache: Instantiated new connection cache. 13/03/13 14:00:09 INFO manager.SqlManager: Using default fetchSize of 1000 13/03/13 14:00:09 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.OracleManager@5557c2bd 13/03/13 14:00:09 INFO tool.CodeGenTool: Beginning code generation 13/03/13 14:00:09 DEBUG manager.OracleManager: Using column names query: SELECT t.* FROM BTTN_BKP_TEST t WHERE 1=0 13/03/13 14:00:09 DEBUG manager.OracleManager: Creating a new connection for jdbc:oracle:thin:@10.99.42.11:1521/clouddb, using username: HDFSUSER 13/03/13 14:00:09 DEBUG manager.OracleManager: No connection paramenters specified. Using regular API for making connection. 13/03/13 14:00:10 INFO manager.OracleManager: Time zone has been set to GMT 13/03/13 14:00:10 DEBUG manager.SqlManager: Using fetchSize for next query: 1000 13/03/13 14:00:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM BTTN_BKP_TEST t WHERE 1=0 13/03/13 14:00:10 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:@10.99.42.11:1521/clouddb/HDFSUSER 13/03/13 14:00:10 DEBUG orm.ClassWriter: selected columns: 13/03/13 14:00:10 DEBUG orm.ClassWriter: BTTN_ID 13/03/13 14:00:10 DEBUG orm.ClassWriter: DATA_INST_ID 13/03/13 14:00:10 DEBUG orm.ClassWriter: SCR_ID 13/03/13 14:00:10 DEBUG orm.ClassWriter: BTTN_NU 13/03/13 14:00:10 DEBUG orm.ClassWriter: CAT 13/03/13 14:00:10 DEBUG orm.ClassWriter: WDTH 13/03/13 14:00:10 DEBUG orm.ClassWriter: HGHT 13/03/13 14:00:10 DEBUG orm.ClassWriter: KEY_SCAN 13/03/13 14:00:10 DEBUG orm.ClassWriter: KEY_SHFT 13/03/13 14:00:10 DEBUG orm.ClassWriter: FRGND_CPTN_COLR 13/03/13 14:00:10 DEBUG orm.ClassWriter: FRGND_CPTN_COLR_PRSD 13/03/13 14:00:10 DEBUG orm.ClassWriter: BKGD_CPTN_COLR 13/03/13 14:00:10 DEBUG orm.ClassWriter: BKGD_CPTN_COLR_PRSD 13/03/13 14:00:10 DEBUG orm.ClassWriter: BLM_FL 13/03/13 14:00:10 DEBUG orm.ClassWriter: LCLZ_FL 13/03/13 14:00:10 DEBUG orm.ClassWriter: MENU_ITEM_NU 13/03/13 14:00:10 DEBUG orm.ClassWriter: BTTN_ASGN_LVL_ID 13/03/13 14:00:10 DEBUG orm.ClassWriter: ON_ATVT 13/03/13 14:00:10 DEBUG orm.ClassWriter: ON_CLIK 13/03/13 14:00:10 DEBUG orm.ClassWriter: ENBL_FL 13/03/13 14:00:10 DEBUG orm.ClassWriter: BLM_SET_ID 13/03/13 14:00:10 DEBUG orm.ClassWriter: BTTN_ASGN_LVL_NAME 13/03/13 14:00:10 DEBUG orm.ClassWriter: MKT_ID 13/03/13 14:00:10 DEBUG orm.ClassWriter:
-
Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an errorJarek Jarcec Cecho 2013-03-16, 22:56
Hi Ajit,
please accept my apologies for late reply. Please upgrade to latest Sqoop version 1.4.3, we've improved the error message so it should be much easier to triangulate the problem. Jarcec On Wed, Mar 13, 2013 at 08:44:24AM +0000, Ajit Kumar Shreevastava wrote: > Thanks Chalcy and Jarek > > Now, I am facing one new problem with Sqoop export utility. > > I have implemented what both of you suggested in chain mail and my problem is solved for importing the data to Hive. > > Now I have truncate the table in oracle and try to export the data from just populated table in Hive is working fine and no record mismatch was found. > Command for export to oracle is :--> > [hadoop@NHCLT-PC44-2 sqoop-oper]$ sqoop export --connect jdbc:oracle:thin:@10.99.42.11:1521/clouddb --username HDFSUSER --table BTTN_BKP_TEST --export-dir /home/hadoop/user/hive/warehouse/bttn_bkp_test -P -m 1 --input-fields-terminated-by '\0001' -verbose > > Now I want to create a table in hive as a copy of bttn_bkp_test. > I described the bttn_bkp_test table and created the table column described in bttn_bkp_test like :--> > > hive> CREATE TABLE IF NOT EXISTS BTTN_BKP( > > bttn_id double, > > data_inst_id double, > > scr_id double, > > bttn_nu double, > > cat double, > > wdth double, > > hght double, > > key_scan double, > > key_shft double, > > frgnd_cptn_colr string, > > frgnd_cptn_colr_prsd string, > > bkgd_cptn_colr string, > > bkgd_cptn_colr_prsd string, > > blm_fl double, > > lclz_fl double, > > menu_item_nu double, > > bttn_asgn_lvl_id double, > > on_atvt double, > > on_clik double, > > enbl_fl double, > > blm_set_id double, > > bttn_asgn_lvl_name string, > > mkt_id double, > > crte_ts string, > > crte_user_id string, > > updt_ts string, > > updt_user_id string, > > del_ts string, > > del_user_id string, > > dltd_fl double, > > menu_item_na string, > > prd_cd double, > > blm_set_na string, > > sound_file_id double, > > is_dynmc_bttn double, > > frgnd_cptn_colr_id double, > > frgnd_cptn_colr_prsd_id double, > > bkgd_cptn_colr_id double, > > bkgd_cptn_colr_prsd_id double); > > Then I populate this table by following command:--> > hive> insert OVERWRITE table bttn_bkp > > select * from BTTN_BKP_TEST; > > And its insert the same data in bttn_bkp without any error. > Now I have truncated the oracle table bttn_bkp_test and try to populate the table by bttn_bkp table data of Hive through below Sqoop command :--> > > [hadoop@NHCLT-PC44-2 sqoop-oper]$ sqoop export --connect jdbc:oracle:thin:@10.99.42.11:1521/clouddb --username HDFSUSER --table BTTN_BKP_TEST --export-dir /home/hadoop/user/hive/warehouse/bttn_bkp_test -P -m 1 --input-fields-terminated-by '\0001' -verbose > > And I got below error error:--> > > Warning: /usr/lib/hbase does not exist! HBase imports will fail. > Please set $HBASE_HOME to the root of your HBase installation. > 13/03/13 14:00:04 DEBUG tool.BaseSqoopTool: Enabled debug logging. > Enter password: > 13/03/13 14:00:09 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory > 13/03/13 14:00:09 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory > 13/03/13 14:00:09 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:oracle:thin:@10.99.42.11 > 13/03/13 14:00:09 DEBUG manager.OracleManager$ConnCache: Instantiated new connection cache.
-
Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an errorChalcy 2013-03-17, 12:55
When you create the table copy table, use, "create table copytable like
origtable" instead of creating hive copy table using script like you are doing. Cheers, Chalcy On Wed, Mar 13, 2013 at 4:44 AM, Ajit Kumar Shreevastava < [EMAIL PROTECTED]> wrote: > Thanks Chalcy and Jarek**** > > ** ** > > Now, I am facing one new problem with Sqoop export utility.**** > > ** ** > > I have implemented what both of you suggested in chain mail and my problem > is solved for importing the data to Hive.**** > > ** ** > > Now I have truncate the table in oracle and try to export the data from > just populated table in Hive is working fine and no record mismatch was > found.**** > > Command for export to oracle is :à**** > > *[hadoop@NHCLT-PC44-2 sqoop-oper]$ *sqoop export --connect > jdbc:oracle:thin:@10.99.42.11:1521/clouddb --username HDFSUSER --table > BTTN_BKP_TEST --export-dir /home/hadoop/user/hive/warehouse/bttn_bkp_test > -P -m 1 --input-fields-terminated-by '\0001' –verbose**** > > ** ** > > Now I want to create a table in hive as a copy of bttn_bkp_test.**** > > I described the bttn_bkp_test table and created the table column described > in bttn_bkp_test like :à**** > > ** ** > > *hive>* CREATE TABLE IF NOT EXISTS BTTN_BKP(**** > > > bttn_id double,**** > > > data_inst_id double,**** > > > scr_id double,**** > > > bttn_nu double,**** > > > cat double,**** > > > wdth double,**** > > > hght double,**** > > > key_scan double,**** > > > key_shft double,**** > > > frgnd_cptn_colr string,**** > > > frgnd_cptn_colr_prsd string,**** > > > bkgd_cptn_colr string,**** > > > bkgd_cptn_colr_prsd string,**** > > > blm_fl double,**** > > > lclz_fl double,**** > > > menu_item_nu double,**** > > > bttn_asgn_lvl_id double,**** > > > on_atvt double,**** > > > on_clik double,**** > > > enbl_fl double,**** > > > blm_set_id double,**** > > > bttn_asgn_lvl_name string,**** > > > mkt_id double,**** > > > crte_ts string,**** > > > crte_user_id string,**** > > > updt_ts string,**** > > > updt_user_id string,**** > > > del_ts string,**** > > > del_user_id string,**** > > > dltd_fl double,**** > > > menu_item_na string,**** > > > prd_cd double,**** > > > blm_set_na string,**** > > > sound_file_id double,**** > > > is_dynmc_bttn double,**** > > > frgnd_cptn_colr_id double,**** > > > frgnd_cptn_colr_prsd_id double,**** > > > bkgd_cptn_colr_id double,**** > > > bkgd_cptn_colr_prsd_id double);**** > > ** ** > > Then I populate this table by following command:à**** > > *hive>* insert OVERWRITE table bttn_bkp**** > > > select * from BTTN_BKP_TEST;**** > > ** ** > > And its insert the same data in bttn_bkp without any error.**** > > **** > > Now I have truncated the oracle table bttn_bkp_test and try to populate > the table by bttn_bkp table data of Hive through below Sqoop command :à ** > ** > > ** ** > > *[hadoop@NHCLT-PC44-2 sqoop-oper]$* sqoop export --connect > jdbc:oracle:thin:@10.99.42.11:1521/clouddb --username HDFSUSER --table > BTTN_BKP_TEST --export-dir /home/hadoop/user/hive/warehouse/bttn_bkp_test > -P -m 1 --input-fields-terminated-by '\0001' –verbose**** > > ** ** > > And I got below error error:à**** > > ** ** > > Warning: /usr/lib/hbase does not exist! HBase imports will fail.**** > > Please set $HBASE_HOME to the root of your HBase installation.**** > > 13/03/13 14:00:04 DEBUG tool.BaseSqoopTool: Enabled debug logging.**** |