|
|
-
Error while importing data from oracle database when using split-by timestamp column
Sadasiva Guntupalli 2012-09-10, 22:23
I am running the following sqoop command to import data from Oracle 11g to Hive database. It throws the following exception. I appreciate your help. *
Sqoop Command: *
sqoop import --connect jdbc:oracle:thin:@//abc-scan:1521/RACMSLT --query 'SELECT R.MESSAGE_ID, R.EVENT_DATETIME FROM EVENT_CONTENT PARTITION(CONTENT_P20120826) R WHERE $CONDITIONS' --split-by EVENT_DATETIME --username ABC --password RMLT01 --target-dir /user/hive/SS/CONTENT --hive-table EVENT_CONTENT --hive-import --hive-partition-key HYDRO_PART_DATE --hive-partition-value "2012-08-26" -m 6 --verbose *
Oracle database table: *
EVENT_CONTENT
-------------
MESSAGE_ID NUMBER
EVENT_DATETIME TIMESTAMP(3)
*
Exception: *
java.io.IOException: SQLException in nextKeyValue
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:265)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:474)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:183)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:668)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.sql.SQLDataException: ORA-01843: not a valid month
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:457)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:400)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:926)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:476)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:200)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:543)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:238)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:1244)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1492)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1710)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:4372)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:4453)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:6270)
at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:236)
... 11 more
Thanks Sadasiva
-
Re: Error while importing data from oracle database when using split-by timestamp column
Jarek Jarcec Cecho 2012-09-11, 06:48
Hi Sadasiva, could you please provide entire log that sqoop generated with --verbose argument?
Also it seems that your table contains only two columns MESSAGE_ID and EVENT_TIME. Column MESSAGE_ID seems to me as primary key of the table. If so, I would recommend to use this column for splitting by altering your command line with "--split-by MESSAGE_ID".
Jarcec
On Mon, Sep 10, 2012 at 03:23:46PM -0700, Sadasiva Guntupalli wrote: > I am running the following sqoop command to import data from Oracle 11g to > Hive database. It throws the following exception. I appreciate your help. > * > > Sqoop Command: > * > > sqoop import --connect jdbc:oracle:thin:@//abc-scan:1521/RACMSLT --query > 'SELECT R.MESSAGE_ID, R.EVENT_DATETIME FROM EVENT_CONTENT > PARTITION(CONTENT_P20120826) R WHERE $CONDITIONS' --split-by EVENT_DATETIME > --username ABC --password RMLT01 --target-dir /user/hive/SS/CONTENT > --hive-table EVENT_CONTENT --hive-import --hive-partition-key > HYDRO_PART_DATE --hive-partition-value "2012-08-26" -m 6 --verbose > * > > Oracle database table: > * > > EVENT_CONTENT > > ------------- > > MESSAGE_ID NUMBER > > EVENT_DATETIME TIMESTAMP(3) > > * > > Exception: > * > > java.io.IOException: SQLException in nextKeyValue > > at > org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:265) > > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:474) > > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > > at > org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:183) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:668) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334) > > at org.apache.hadoop.mapred.Child$4.run(Child.java:270) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) > > at org.apache.hadoop.mapred.Child.main(Child.java:264) > > Caused by: java.sql.SQLDataException: ORA-01843: not a valid month > > at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:457) > > at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:400) > > at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:926) > > at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:476) > > at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:200) > > at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:543) > > at > oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:238) > > at > oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:1244) > > at > oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1492) > > at > oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1710) > > at > oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:4372) > > at > oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:4453) > > at > oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:6270) > > at > org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111) > > at > org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:236) > > ... 11 more > > Thanks > Sadasiva
-
Re: Error while importing data from oracle database when using split-by timestamp column
Sadasiva Guntupalli 2012-09-11, 17:08
Hi Jarcec,
Thank you. There is no primary key on MESSAGE_ID column. The following is the table structure. This table has approximately 100 million rows in each partition. I am trying to import one partition at a time. Please find the sqoop log below. There is an index defined on the column HYDRO_DATETIME. I am using split-by HYDRO_DATETIME to balance the load on all the 6 nodes.
RMS_DXC_HYDRO_CONTENT ------------------------------------------------ MESSAGE_ID NUMBER NOT NULL HYDRO_DATETIME TIMESTAMP(3) NOT NULL PSI_10_VAL FLOAT(126)
The following is the sqoop log after --verbose option is enabled.
[mapr@lxhadoop6 rms]$ sqoop import --connect jdbc:oracle:thin:@//rmslt-scan:1521/RACRMSLT --query 'SELECT R.MESSAGE_ID, R.HYDRO_DATETIME, R.PSI_10_VAL FROM RMS_DXC_HYDRO_CONTENT PARTITION(DXC_HYDRO_CONTENT_P20120826) R WHERE $CONDITIONS' --split-by HYDRO_DATETIME --username RMS -P --target-dir /user/hive/HYDRO_CONTENT1 --hive-table RMS_DXC_HYDRO_CONTENT --hive-import --hive-partition-key HYDRO_PART_DATE --hive-partition-value "2012-08-26" -m 2 --verbose Enter password: 12/09/11 09:52:26 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 12/09/11 09:52:26 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 12/09/11 09:52:26 INFO manager.SqlManager: Using default fetchSize of 1000 12/09/11 09:52:26 INFO tool.CodeGenTool: Beginning code generation 12/09/11 09:52:28 INFO manager.OracleManager: Time zone has been set to GMT 12/09/11 09:52:28 INFO manager.SqlManager: Executing SQL statement: SELECT R.MESSAGE_ID, R.HYDRO_DATETIME, R.PSI_10_VAL FROM RMS_DXC_HYDRO_CONTENT PARTITION(DXC_HYDRO_CONTENT_P20120826) R WHERE (1 = 0) 12/09/11 09:52:28 INFO manager.SqlManager: Executing SQL statement: SELECT R.MESSAGE_ID, R.HYDRO_DATETIME, R.PSI_10_VAL FROM RMS_DXC_HYDRO_CONTENT PARTITION(DXC_HYDRO_CONTENT_P20120826) R WHERE (1 = 0) 12/09/11 09:52:28 INFO orm.CompilationManager: HADOOP_HOME is /opt/mapr/hadoop/hadoop-0.20.2/bin/.. Note: /tmp/sqoop-mapr/compile/3276fada0f9443f93a4b3b64ee4fb126/QueryResult.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 12/09/11 09:52:29 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-mapr/compile/3276fada0f9443f93a4b3b64ee4fb126/QueryResult.java to /home/mapr/rms/./QueryResult.java java.io.IOException: Destination '/home/mapr/rms/./QueryResult.java' already exists at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:1811) at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:227) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:83) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:367) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453) at org.apache.sqoop.Sqoop.run(Sqoop.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) at org.apache.sqoop.Sqoop.main(Sqoop.java:238) at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57) 12/09/11 09:52:29 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-mapr/compile/3276fada0f9443f93a4b3b64ee4fb126/QueryResult.jar 12/09/11 09:52:29 INFO mapreduce.ImportJobBase: Beginning query import. 12/09/11 09:52:29 INFO fs.JobTrackerWatcher: Current running JobTracker is: lxhadoop2.beckman.com/10.32.8.137:9001 12/09/11 09:52:29 DEBUG mapred.JobClient: adding the following namenodes' delegation tokens:null 12/09/11 09:52:29 DEBUG mapred.JobClient: default FileSystem: maprfs:/// 12/09/11 09:52:30 DEBUG mapred.JobClient: Creating splits at maprfs:/var/mapr/cluster/mapred/jobTracker/staging/mapr/.staging/job_201208272200_0078 12/09/11 09:52:30 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(HYDRO_DATETIME), MAX(HYDRO_DATETIME) FROM (SELECT R.MESSAGE_ID, R.HYDRO_DATETIME, R.PSI_10_VAL FROM RMS_DXC_HYDRO_CONTENT PARTITION(DXC_HYDRO_CONTENT_P20120826) R WHERE (1 = 1) ) t1 12/09/11 09:52:43 INFO mapred.JobClient: Creating job's output directory at /user/hive/HYDRO_CONTENT1 12/09/11 09:52:43 INFO mapred.JobClient: Creating job's user history location directory at /user/hive/HYDRO_CONTENT1/_logs 12/09/11 09:52:43 DEBUG mapred.JobClient: Printing tokens for job: job_201208272200_0078 12/09/11 09:52:43 INFO mapred.JobClient: Running job: job_201208272200_0078 12/09/11 09:52:44 INFO mapred.JobClient: map 0% reduce 0% 12/09/11 09:52:53 INFO mapred.JobClient: Task Id : attempt_201208272200_0078_m_000000_0, Status : FAILED on node lxhadoop2.beckman.com java.io.IOException: SQLException in nextKeyValue at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:265) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:474) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:183) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:668) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.sql.SQLDataException: ORA-01843: not a valid month at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:457) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:400) at orac
-
Re: Error while importing data from oracle database when using split-by timestamp column
abhijeet gaikwad 2012-09-11, 18:18
Sqoop uses where clauses with SELECT query to set upper and lower bound of the split. It seems the format of the date value in the where clause of the SELECT Query does not match that in the table. You will be able to see the query fired by Sqoop in your Oracle Logs. Can you check if the date format matches? If no, then it may be a limitation of Sqoop. You can file a JIRA for this issue and use some other column for splitting the data.
Let me know if this doesn't work out.
Thanks, Abhijeet
On 9/11/12, Sadasiva Guntupalli <[EMAIL PROTECTED]> wrote: > Hi Jarcec, > > Thank you. There is no primary key on MESSAGE_ID column. The following is > the table structure. This table has approximately 100 million rows in each > partition. I am trying to import one partition at a time. > Please find the sqoop log below. There is an index defined on the column > HYDRO_DATETIME. > I am using split-by HYDRO_DATETIME to balance the load on all the 6 nodes. > > RMS_DXC_HYDRO_CONTENT > ------------------------------------------------ > MESSAGE_ID NUMBER NOT NULL > HYDRO_DATETIME TIMESTAMP(3) NOT NULL > PSI_10_VAL FLOAT(126) > > The following is the sqoop log after --verbose option is enabled. > > [mapr@lxhadoop6 rms]$ sqoop import --connect > jdbc:oracle:thin:@//rmslt-scan:1521/RACRMSLT > --query 'SELECT R.MESSAGE_ID, R.HYDRO_DATETIME, R.PSI_10_VAL FROM > RMS_DXC_HYDRO_CONTENT PARTITION(DXC_HYDRO_CONTENT_P20120826) R WHERE > $CONDITIONS' --split-by HYDRO_DATETIME --username RMS -P --target-dir > /user/hive/HYDRO_CONTENT1 --hive-table RMS_DXC_HYDRO_CONTENT --hive-import > --hive-partition-key HYDRO_PART_DATE --hive-partition-value "2012-08-26" > -m 2 --verbose > Enter password: > 12/09/11 09:52:26 INFO tool.BaseSqoopTool: Using Hive-specific delimiters > for output. You can override > 12/09/11 09:52:26 INFO tool.BaseSqoopTool: delimiters with > --fields-terminated-by, etc. > 12/09/11 09:52:26 INFO manager.SqlManager: Using default fetchSize of 1000 > 12/09/11 09:52:26 INFO tool.CodeGenTool: Beginning code generation > 12/09/11 09:52:28 INFO manager.OracleManager: Time zone has been set to GMT > 12/09/11 09:52:28 INFO manager.SqlManager: Executing SQL statement: SELECT > R.MESSAGE_ID, R.HYDRO_DATETIME, R.PSI_10_VAL FROM RMS_DXC_HYDRO_CONTENT > PARTITION(DXC_HYDRO_CONTENT_P20120826) R WHERE (1 = 0) > 12/09/11 09:52:28 INFO manager.SqlManager: Executing SQL statement: SELECT > R.MESSAGE_ID, R.HYDRO_DATETIME, R.PSI_10_VAL FROM RMS_DXC_HYDRO_CONTENT > PARTITION(DXC_HYDRO_CONTENT_P20120826) R WHERE (1 = 0) > 12/09/11 09:52:28 INFO orm.CompilationManager: HADOOP_HOME is > /opt/mapr/hadoop/hadoop-0.20.2/bin/.. > Note: > /tmp/sqoop-mapr/compile/3276fada0f9443f93a4b3b64ee4fb126/QueryResult.java > uses or overrides a deprecated API. > Note: Recompile with -Xlint:deprecation for details. > 12/09/11 09:52:29 ERROR orm.CompilationManager: Could not rename > /tmp/sqoop-mapr/compile/3276fada0f9443f93a4b3b64ee4fb126/QueryResult.java > to /home/mapr/rms/./QueryResult.java > java.io.IOException: Destination '/home/mapr/rms/./QueryResult.java' > already exists > at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:1811) > at > org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:227) > at > org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:83) > at > org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:367) > at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453) > at org.apache.sqoop.Sqoop.run(Sqoop.java:145) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) > at org.apache.sqoop.Sqoop.main(Sqoop.java:238) > at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57) > 12/09/11 09:52:29 INFO orm.CompilationManager: Writing jar file:
-
Re: Error while importing data from oracle database when using split-by timestamp column
Sadasiva Guntupalli 2012-09-11, 18:33
Hi Abhijeet, Thank you. The following is the sqoop query extracted from the table V_$SQL in Oracle database. Sqoop throws the exception while executing the following query. SELECT R.MESSAGE_ID, R.HYDRO_DATETIME, R.PSI_10_VAL FROM RMS_DXC_HYDRO_CONTENT PARTITION(DXC_HYDRO_CONTENT_P20120826) R WHERE ( HYDRO_DATETIME >= :"SYS_B_0" ) AND ( HYDRO_DATETIME <= :"SYS_B_1" )
Thanks Sadasiva
On Tue, Sep 11, 2012 at 11:18 AM, abhijeet gaikwad <[EMAIL PROTECTED]>wrote:
> Sqoop uses where clauses with SELECT query to set upper and lower > bound of the split. It seems the format of the date value in the where > clause of the SELECT Query does not match that in the table. You will > be able to see the query fired by Sqoop in your Oracle Logs. Can you > check if the date format matches? > If no, then it may be a limitation of Sqoop. You can file a JIRA for > this issue and use some other column for splitting the data. > > Let me know if this doesn't work out. > > Thanks, > Abhijeet > > On 9/11/12, Sadasiva Guntupalli <[EMAIL PROTECTED]> wrote: > > Hi Jarcec, > > > > Thank you. There is no primary key on MESSAGE_ID column. The following is > > the table structure. This table has approximately 100 million rows in > each > > partition. I am trying to import one partition at a time. > > Please find the sqoop log below. There is an index defined on the column > > HYDRO_DATETIME. > > I am using split-by HYDRO_DATETIME to balance the load on all the 6 > nodes. > > > > RMS_DXC_HYDRO_CONTENT > > ------------------------------------------------ > > MESSAGE_ID NUMBER NOT NULL > > HYDRO_DATETIME TIMESTAMP(3) NOT NULL > > PSI_10_VAL FLOAT(126) > > > > The following is the sqoop log after --verbose option is enabled. > > > > [mapr@lxhadoop6 rms]$ sqoop import --connect > > jdbc:oracle:thin:@//rmslt-scan:1521/RACRMSLT > > --query 'SELECT R.MESSAGE_ID, R.HYDRO_DATETIME, R.PSI_10_VAL FROM > > RMS_DXC_HYDRO_CONTENT PARTITION(DXC_HYDRO_CONTENT_P20120826) R WHERE > > $CONDITIONS' --split-by HYDRO_DATETIME --username RMS -P --target-dir > > /user/hive/HYDRO_CONTENT1 --hive-table RMS_DXC_HYDRO_CONTENT > --hive-import > > --hive-partition-key HYDRO_PART_DATE --hive-partition-value > "2012-08-26" > > -m 2 --verbose > > Enter password: > > 12/09/11 09:52:26 INFO tool.BaseSqoopTool: Using Hive-specific delimiters > > for output. You can override > > 12/09/11 09:52:26 INFO tool.BaseSqoopTool: delimiters with > > --fields-terminated-by, etc. > > 12/09/11 09:52:26 INFO manager.SqlManager: Using default fetchSize of > 1000 > > 12/09/11 09:52:26 INFO tool.CodeGenTool: Beginning code generation > > 12/09/11 09:52:28 INFO manager.OracleManager: Time zone has been set to > GMT > > 12/09/11 09:52:28 INFO manager.SqlManager: Executing SQL statement: > SELECT > > R.MESSAGE_ID, R.HYDRO_DATETIME, R.PSI_10_VAL FROM RMS_DXC_HYDRO_CONTENT > > PARTITION(DXC_HYDRO_CONTENT_P20120826) R WHERE (1 = 0) > > 12/09/11 09:52:28 INFO manager.SqlManager: Executing SQL statement: > SELECT > > R.MESSAGE_ID, R.HYDRO_DATETIME, R.PSI_10_VAL FROM RMS_DXC_HYDRO_CONTENT > > PARTITION(DXC_HYDRO_CONTENT_P20120826) R WHERE (1 = 0) > > 12/09/11 09:52:28 INFO orm.CompilationManager: HADOOP_HOME is > > /opt/mapr/hadoop/hadoop-0.20.2/bin/.. > > Note: > > /tmp/sqoop-mapr/compile/3276fada0f9443f93a4b3b64ee4fb126/QueryResult.java > > uses or overrides a deprecated API. > > Note: Recompile with -Xlint:deprecation for details. > > 12/09/11 09:52:29 ERROR orm.CompilationManager: Could not rename > > /tmp/sqoop-mapr/compile/3276fada0f9443f93a4b3b64ee4fb126/QueryResult.java > > to /home/mapr/rms/./QueryResult.java > > java.io.IOException: Destination '/home/mapr/rms/./QueryResult.java' > > already exists > > at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:1811) > > at > > > org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:227) > > at > > org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:83)
-
Re: Error while importing data from oracle database when using split-by timestamp column
Sadasiva Guntupalli 2012-09-20, 17:55
Hi,
Has anyone successfully imported data from oracle using split-by timestamp column in sqoop import command?
Thanks Sadasiva On Tue, Sep 11, 2012 at 11:33 AM, Sadasiva Guntupalli < [EMAIL PROTECTED]> wrote:
> Hi Abhijeet, > Thank you. The following is the sqoop query extracted from the table > V_$SQL in Oracle database. Sqoop throws > the exception while executing the following query. > SELECT R.MESSAGE_ID, R.HYDRO_DATETIME, R.PSI_10_VAL FROM > RMS_DXC_HYDRO_CONTENT PARTITION(DXC_HYDRO_CONTENT_P20120826) R WHERE ( > HYDRO_DATETIME >= :"SYS_B_0" ) AND ( HYDRO_DATETIME <= :"SYS_B_1" ) > > Thanks > Sadasiva > > On Tue, Sep 11, 2012 at 11:18 AM, abhijeet gaikwad <[EMAIL PROTECTED] > > wrote: > >> Sqoop uses where clauses with SELECT query to set upper and lower >> bound of the split. It seems the format of the date value in the where >> clause of the SELECT Query does not match that in the table. You will >> be able to see the query fired by Sqoop in your Oracle Logs. Can you >> check if the date format matches? >> If no, then it may be a limitation of Sqoop. You can file a JIRA for >> this issue and use some other column for splitting the data. >> >> Let me know if this doesn't work out. >> >> Thanks, >> Abhijeet >> >> On 9/11/12, Sadasiva Guntupalli <[EMAIL PROTECTED]> wrote: >> > Hi Jarcec, >> > >> > Thank you. There is no primary key on MESSAGE_ID column. The following >> is >> > the table structure. This table has approximately 100 million rows in >> each >> > partition. I am trying to import one partition at a time. >> > Please find the sqoop log below. There is an index defined on the column >> > HYDRO_DATETIME. >> > I am using split-by HYDRO_DATETIME to balance the load on all the 6 >> nodes. >> > >> > RMS_DXC_HYDRO_CONTENT >> > ------------------------------------------------ >> > MESSAGE_ID NUMBER NOT NULL >> > HYDRO_DATETIME TIMESTAMP(3) NOT NULL >> > PSI_10_VAL FLOAT(126) >> > >> > The following is the sqoop log after --verbose option is enabled. >> > >> > [mapr@lxhadoop6 rms]$ sqoop import --connect >> > jdbc:oracle:thin:@//rmslt-scan:1521/RACRMSLT >> > --query 'SELECT R.MESSAGE_ID, R.HYDRO_DATETIME, R.PSI_10_VAL FROM >> > RMS_DXC_HYDRO_CONTENT PARTITION(DXC_HYDRO_CONTENT_P20120826) R WHERE >> > $CONDITIONS' --split-by HYDRO_DATETIME --username RMS -P --target-dir >> > /user/hive/HYDRO_CONTENT1 --hive-table RMS_DXC_HYDRO_CONTENT >> --hive-import >> > --hive-partition-key HYDRO_PART_DATE --hive-partition-value >> "2012-08-26" >> > -m 2 --verbose >> > Enter password: >> > 12/09/11 09:52:26 INFO tool.BaseSqoopTool: Using Hive-specific >> delimiters >> > for output. You can override >> > 12/09/11 09:52:26 INFO tool.BaseSqoopTool: delimiters with >> > --fields-terminated-by, etc. >> > 12/09/11 09:52:26 INFO manager.SqlManager: Using default fetchSize of >> 1000 >> > 12/09/11 09:52:26 INFO tool.CodeGenTool: Beginning code generation >> > 12/09/11 09:52:28 INFO manager.OracleManager: Time zone has been set to >> GMT >> > 12/09/11 09:52:28 INFO manager.SqlManager: Executing SQL statement: >> SELECT >> > R.MESSAGE_ID, R.HYDRO_DATETIME, R.PSI_10_VAL FROM RMS_DXC_HYDRO_CONTENT >> > PARTITION(DXC_HYDRO_CONTENT_P20120826) R WHERE (1 = 0) >> > 12/09/11 09:52:28 INFO manager.SqlManager: Executing SQL statement: >> SELECT >> > R.MESSAGE_ID, R.HYDRO_DATETIME, R.PSI_10_VAL FROM RMS_DXC_HYDRO_CONTENT >> > PARTITION(DXC_HYDRO_CONTENT_P20120826) R WHERE (1 = 0) >> > 12/09/11 09:52:28 INFO orm.CompilationManager: HADOOP_HOME is >> > /opt/mapr/hadoop/hadoop-0.20.2/bin/.. >> > Note: >> > >> /tmp/sqoop-mapr/compile/3276fada0f9443f93a4b3b64ee4fb126/QueryResult.java >> > uses or overrides a deprecated API. >> > Note: Recompile with -Xlint:deprecation for details. >> > 12/09/11 09:52:29 ERROR orm.CompilationManager: Could not rename >> > >> /tmp/sqoop-mapr/compile/3276fada0f9443f93a4b3b64ee4fb126/QueryResult.java
|
|