|
Mason
2012-12-27, 20:46
Jarek Jarcec Cecho
2012-12-28, 18:42
Mason
2012-12-31, 19:03
Jarek Jarcec Cecho
2013-01-01, 14:37
|
-
Sqoop not updating incremental.last.value when used with "--hive-import"Mason 2012-12-27, 20:46
I'm using hive 0.9.0, hadoop 1.1.1, and sqoop 1.4.2.
When I create a saved job to do an incremental import from a MySQL server and import into Hive using "--hive-import", it appears that the data is correctly imported, but that the incremental.last.value of the job is not updated, so re-executing the job just imports the same data. If I create a job that's identical in all respects, but leave off "--hive-import", it finishes with "INFO tool.ImportTool: Saving incremental import state to the metastore", as I'd expect, and does in fact update the incremental.last.value. Is this behavior a bug? Or does Sqoop just not support incremental imports into Hive? -Mason
-
Re: Sqoop not updating incremental.last.value when used with "--hive-import"Jarek Jarcec Cecho 2012-12-28, 18:42
Hi Mason,
that seems as a bug to me. Would you mind opening a new JIRA [1] for that? Jarcec Links: 1: https://issues.apache.org/jira/browse/SQOOP On Thu, Dec 27, 2012 at 12:46:31PM -0800, Mason wrote: > I'm using hive 0.9.0, hadoop 1.1.1, and sqoop 1.4.2. > > When I create a saved job to do an incremental import from a MySQL > server and import into Hive using "--hive-import", it appears that the > data is correctly imported, but that the incremental.last.value of the > job is not updated, so re-executing the job just imports the same > data. > > If I create a job that's identical in all respects, but leave off > "--hive-import", it finishes with "INFO tool.ImportTool: Saving > incremental import state to the metastore", as I'd expect, and does in > fact update the incremental.last.value. > > Is this behavior a bug? Or does Sqoop just not support incremental > imports into Hive? > > -Mason
-
Re: Sqoop not updating incremental.last.value when used with "--hive-import"Mason 2012-12-31, 19:03
hi Jarcec,
I dug deeper and it looks like a configuration error on my end resulted in Hive throwing an error, which caused Sqoop to silently abort the import, before updating incremental.last.value. I looked around the source, but my Java is too rusty to figure out why Sqoop didn't report failure after that error. Still want a JIRA? Logs are below. ### Sqoop CLI output ### ### ...a bunch of prior output... ### 12/12/27 10:57:38 INFO mapreduce.ImportJobBase: Transferred 9.1538 MB in 49.4693 seconds (189.4804 KB/sec) 12/12/27 10:57:38 INFO mapreduce.ImportJobBase: Retrieved 41969 records. 12/12/27 10:57:38 INFO util.AppendUtils: Appending to directory dataset 12/12/27 10:57:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dataset` AS t LIMIT 1 12/12/27 10:57:40 WARN hive.TableDefWriter: Column created_at had to be cast to a less precise type in Hive 12/12/27 10:57:40 WARN hive.TableDefWriter: Column updated_at had to be cast to a less precise type in Hive 12/12/27 10:57:40 INFO hive.HiveImport: Removing temporary files from import process: hdfs://localhost:9000/user/mason/dataset/_logs 12/12/27 10:57:40 INFO hive.HiveImport: Loading uploaded data into Hive WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Logging initialized using configuration in jar:file:/usr/local/Cellar/sqoop/1.4.2/libexec/lib/hive-common-0.9.0.jar!/hive-log4j.properties Hive history file=/tmp/mason/hive_job_log_mason_201212271057_2014979494.txt OK Time taken: 4.503 seconds Loading data to table default.dataset OK Time taken: 0.656 seconds ### end of output ### ### Hive error log ### 2012-12-27 10:57:40,138 WARN conf.HiveConf (HiveConf.java:<clinit>(70)) - hive-site.xml not found on CLASSPATH 2012-12-27 10:57:41,635 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved. 2012-12-27 10:57:41,635 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved. 2012-12-27 10:57:41,636 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved. 2012-12-27 10:57:41,636 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved. 2012-12-27 10:57:41,637 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved. 2012-12-27 10:57:41,637 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved. 2012-12-27 10:57:46,030 ERROR tool.ImportTool (ImportTool.java:run(484)) - Encountered IOException running import job: java.io.IOException: Exception thrown in Hive at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:335) at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:226) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476) at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:228) at org.apache.sqoop.tool.JobTool.run(JobTool.java:283) at org.apache.sqoop.Sqoop.run(Sqoop.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) at org.apache.sqoop.Sqoop.main(Sqoop.java:238) at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:308) ... 12 more Caused by: ExitSecurityException at org.apache.sqoop.util.SubprocessSecurityManager.checkExit(SubprocessSecurityManager.java:83) at java.lang.Runtime.exit(Runtime.java:105) at java.lang.System.exit(System.java:960) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:558) ... 17 more ### end of stack trace ### On Fri, Dec 28, 2012 at 10:42 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]> wrote:
-
Re: Sqoop not updating incremental.last.value when used with "--hive-import"Jarek Jarcec Cecho 2013-01-01, 14:37
I see, I'm glad that you were able to investigate the issue and resolve it on your end. I don't see a need to file a JIRA at the moment.
Happy New Year! Jarcec On Mon, Dec 31, 2012 at 11:03:56AM -0800, Mason wrote: > hi Jarcec, > > I dug deeper and it looks like a configuration error on my end > resulted in Hive throwing an error, which caused Sqoop to silently > abort the import, before updating incremental.last.value. I looked > around the source, but my Java is too rusty to figure out why Sqoop > didn't report failure after that error. > > Still want a JIRA? > > Logs are below. > > ### Sqoop CLI output ### > ### ...a bunch of prior output... ### > 12/12/27 10:57:38 INFO mapreduce.ImportJobBase: Transferred 9.1538 MB > in 49.4693 seconds (189.4804 KB/sec) > 12/12/27 10:57:38 INFO mapreduce.ImportJobBase: Retrieved 41969 records. > 12/12/27 10:57:38 INFO util.AppendUtils: Appending to directory dataset > 12/12/27 10:57:39 INFO manager.SqlManager: Executing SQL statement: > SELECT t.* FROM `dataset` AS t LIMIT 1 > 12/12/27 10:57:40 WARN hive.TableDefWriter: Column created_at had to > be cast to a less precise type in Hive > 12/12/27 10:57:40 WARN hive.TableDefWriter: Column updated_at had to > be cast to a less precise type in Hive > 12/12/27 10:57:40 INFO hive.HiveImport: Removing temporary files from > import process: hdfs://localhost:9000/user/mason/dataset/_logs > 12/12/27 10:57:40 INFO hive.HiveImport: Loading uploaded data into Hive > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. > Please use org.apache.hadoop.log.metrics.EventCounter in all the > log4j.properties files. > Logging initialized using configuration in > jar:file:/usr/local/Cellar/sqoop/1.4.2/libexec/lib/hive-common-0.9.0.jar!/hive-log4j.properties > Hive history file=/tmp/mason/hive_job_log_mason_201212271057_2014979494.txt > OK > Time taken: 4.503 seconds > Loading data to table default.dataset > OK > Time taken: 0.656 seconds > ### end of output ### > > ### Hive error log ### > 2012-12-27 10:57:40,138 WARN conf.HiveConf > (HiveConf.java:<clinit>(70)) - hive-site.xml not found on CLASSPATH > 2012-12-27 10:57:41,635 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > "org.eclipse.core.resources" but it cannot be resolved. > 2012-12-27 10:57:41,635 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > "org.eclipse.core.resources" but it cannot be resolved. > 2012-12-27 10:57:41,636 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > "org.eclipse.core.runtime" but it cannot be resolved. > 2012-12-27 10:57:41,636 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > "org.eclipse.core.runtime" but it cannot be resolved. > 2012-12-27 10:57:41,637 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > "org.eclipse.text" but it cannot be resolved. > 2012-12-27 10:57:41,637 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > "org.eclipse.text" but it cannot be resolved. > 2012-12-27 10:57:46,030 ERROR tool.ImportTool > (ImportTool.java:run(484)) - Encountered IOException running import > job: java.io.IOException: Exception thrown in Hive > at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:335) > at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:226) > at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415) > at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476) > at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:228) > at org.apache.sqoop.tool.JobTool.run(JobTool.java:283) > at org.apache.sqoop.Sqoop.run(Sqoop.java:145) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) |