Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Incremental import from PostgreSQL to Hive having issues


Copy link to this message
-
Re: Incremental import from PostgreSQL to Hive having issues
Hi Roshan,

I guess you are using sqoop version older than 17.

You are facing similar issue mentioned in
SQOOP-216<https://issues.cloudera.org/browse/SQOOP-216>

You can try to delete the directory already existing.

Thanks,
Nitin

On Fri, Apr 13, 2012 at 6:12 PM, Roshan Pradeep <[EMAIL PROTECTED]> wrote:

> Hadoop - 0.20.2
> Hive - 0.8.1
>
> Thanks.
>
>
> On Fri, Apr 13, 2012 at 5:03 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>
>> can you tell us what is
>> 1) hive version
>> 2) hadoop version that you are using?
>>
>>
>>
>>
>>
>> On Fri, Apr 13, 2012 at 12:23 PM, Roshan Pradeep <[EMAIL PROTECTED]>wrote:
>>
>>> Hi
>>>
>>> I want to import the updated data from my source (PostgreSQL) to hive
>>> based on a column (lastmodifiedtime) in postgreSQL
>>>
>>> *The command I am using*
>>>
>>> /app/sqoop/bin/sqoop import --hive-table users --connect
>>> jdbc:postgresql:/<server_url>/<database> --table users --username XXXXXXX
>>> --password YYYYYY --hive-home /app/hive --hive-import --incremental
>>> lastmodified --check-column lastmodifiedtime
>>>
>>> *With the above command, I am getting the below error*
>>>
>>> 12/04/13 16:31:21 INFO orm.CompilationManager: Writing jar file:
>>> /tmp/sqoop-root/compile/11ce8600a5656ed49e631a260c387692/users.jar
>>> 12/04/13 16:31:21 INFO tool.ImportTool: Incremental import based on
>>> column "lastmodifiedtime"
>>> 12/04/13 16:31:21 INFO tool.ImportTool: Upper bound value: '2012-04-13
>>> 16:31:21.865429'
>>> 12/04/13 16:31:21 WARN manager.PostgresqlManager: It looks like you are
>>> importing from postgresql.
>>> 12/04/13 16:31:21 WARN manager.PostgresqlManager: This transfer can be
>>> faster! Use the --direct
>>> 12/04/13 16:31:21 WARN manager.PostgresqlManager: option to exercise a
>>> postgresql-specific fast path.
>>> 12/04/13 16:31:21 INFO mapreduce.ImportJobBase: Beginning import of users
>>> 12/04/13 16:31:23 ERROR tool.ImportTool: Encountered IOException running
>>> import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output
>>> directory users already exists
>>>         at
>>> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
>>>         at
>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
>>>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>>>         at
>>> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
>>>         at
>>> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:141)
>>>         at
>>> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:201)
>>>         at
>>> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:413)
>>>         at
>>> org.apache.sqoop.manager.PostgresqlManager.importTable(PostgresqlManager.java:102)
>>>         at
>>> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380)
>>>         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
>>>         at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
>>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
>>>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
>>>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
>>>         at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
>>>         at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
>>>
>>> According to the above, it identify the updated data from postgreSQL,
>>> but it says output directory already exists. Could someone please help me
>>> to correct this issue.
>>>
>>> Thanks.
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>>
>
--
Nitin Pawar
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB