Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Incremental import from PostgreSQL to Hive having issues


Copy link to this message
-
Re: Incremental import from PostgreSQL to Hive having issues
can you tell us what is
1) hive version
2) hadoop version that you are using?

On Fri, Apr 13, 2012 at 12:23 PM, Roshan Pradeep <[EMAIL PROTECTED]>wrote:

> Hi
>
> I want to import the updated data from my source (PostgreSQL) to hive
> based on a column (lastmodifiedtime) in postgreSQL
>
> *The command I am using*
>
> /app/sqoop/bin/sqoop import --hive-table users --connect
> jdbc:postgresql:/<server_url>/<database> --table users --username XXXXXXX
> --password YYYYYY --hive-home /app/hive --hive-import --incremental
> lastmodified --check-column lastmodifiedtime
>
> *With the above command, I am getting the below error*
>
> 12/04/13 16:31:21 INFO orm.CompilationManager: Writing jar file:
> /tmp/sqoop-root/compile/11ce8600a5656ed49e631a260c387692/users.jar
> 12/04/13 16:31:21 INFO tool.ImportTool: Incremental import based on column
> "lastmodifiedtime"
> 12/04/13 16:31:21 INFO tool.ImportTool: Upper bound value: '2012-04-13
> 16:31:21.865429'
> 12/04/13 16:31:21 WARN manager.PostgresqlManager: It looks like you are
> importing from postgresql.
> 12/04/13 16:31:21 WARN manager.PostgresqlManager: This transfer can be
> faster! Use the --direct
> 12/04/13 16:31:21 WARN manager.PostgresqlManager: option to exercise a
> postgresql-specific fast path.
> 12/04/13 16:31:21 INFO mapreduce.ImportJobBase: Beginning import of users
> 12/04/13 16:31:23 ERROR tool.ImportTool: Encountered IOException running
> import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output
> directory users already exists
>         at
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
>         at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
>         at
> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:141)
>         at
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:201)
>         at
> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:413)
>         at
> org.apache.sqoop.manager.PostgresqlManager.importTable(PostgresqlManager.java:102)
>         at
> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380)
>         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
>         at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
>         at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
>         at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
>
> According to the above, it identify the updated data from postgreSQL, but
> it says output directory already exists. Could someone please help me to
> correct this issue.
>
> Thanks.
>

--
Nitin Pawar