Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Incremental import from PostgreSQL to Hive having issues


Copy link to this message
-
Re: Incremental import from PostgreSQL to Hive having issues
Nitin Pawar 2012-04-16, 05:48
best way to solve this is load the data in different partition each time
you load the data. (depending on the data you can put the data partitioned
by date or data-hour combination frequency on which you load the data)

I am not sure how you are installing sqoop. If you are using yum on redhat,
then you can trying doing yum update or you can use apt-get to update.

In hive 0.8.0 there is an option to append to already existing data but in
this case you will need to make sure that duplication of data does not
happen. So partitioning the data is simplest and easiest way to go for now.

Thanks,
Nitin

On Mon, Apr 16, 2012 at 4:26 AM, Roshan Pradeep <[EMAIL PROTECTED]> wrote:

> Hi Nitin
>
> Thanks for your reply.
>
> I am using sqoop *1.4.1-incubating* version. In the sqoop releases
> download page the is no such version you are referring. Please correct me
> if I am wrong.
>
> Delete the warehouse folder and import is working fine, but my tables
> having GB of data, so every time delete & import is not a good answer to my
> solution. I am working on a solution to our production system.
>
> Is there any way to solve this issue.
>
> Thanks.
>
>
> On Fri, Apr 13, 2012 at 11:13 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>
>> Hi Roshan,
>>
>> I guess you are using sqoop version older than 17.
>>
>> You are facing similar issue mentioned in SQOOP-216<https://issues.cloudera.org/browse/SQOOP-216>
>>
>> You can try to delete the directory already existing.
>>
>> Thanks,
>> Nitin
>>
>>
>> On Fri, Apr 13, 2012 at 6:12 PM, Roshan Pradeep <[EMAIL PROTECTED]>wrote:
>>
>>> Hadoop - 0.20.2
>>> Hive - 0.8.1
>>>
>>> Thanks.
>>>
>>>
>>> On Fri, Apr 13, 2012 at 5:03 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>>
>>>> can you tell us what is
>>>> 1) hive version
>>>> 2) hadoop version that you are using?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Apr 13, 2012 at 12:23 PM, Roshan Pradeep <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> I want to import the updated data from my source (PostgreSQL) to hive
>>>>> based on a column (lastmodifiedtime) in postgreSQL
>>>>>
>>>>> *The command I am using*
>>>>>
>>>>> /app/sqoop/bin/sqoop import --hive-table users --connect
>>>>> jdbc:postgresql:/<server_url>/<database> --table users --username XXXXXXX
>>>>> --password YYYYYY --hive-home /app/hive --hive-import --incremental
>>>>> lastmodified --check-column lastmodifiedtime
>>>>>
>>>>> *With the above command, I am getting the below error*
>>>>>
>>>>> 12/04/13 16:31:21 INFO orm.CompilationManager: Writing jar file:
>>>>> /tmp/sqoop-root/compile/11ce8600a5656ed49e631a260c387692/users.jar
>>>>> 12/04/13 16:31:21 INFO tool.ImportTool: Incremental import based on
>>>>> column "lastmodifiedtime"
>>>>> 12/04/13 16:31:21 INFO tool.ImportTool: Upper bound value: '2012-04-13
>>>>> 16:31:21.865429'
>>>>> 12/04/13 16:31:21 WARN manager.PostgresqlManager: It looks like you
>>>>> are importing from postgresql.
>>>>> 12/04/13 16:31:21 WARN manager.PostgresqlManager: This transfer can be
>>>>> faster! Use the --direct
>>>>> 12/04/13 16:31:21 WARN manager.PostgresqlManager: option to exercise a
>>>>> postgresql-specific fast path.
>>>>> 12/04/13 16:31:21 INFO mapreduce.ImportJobBase: Beginning import of
>>>>> users
>>>>> 12/04/13 16:31:23 ERROR tool.ImportTool: Encountered IOException
>>>>> running import job: org.apache.hadoop.mapred.FileAlreadyExistsException:
>>>>> Output directory users already exists
>>>>>         at
>>>>> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
>>>>>         at
>>>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
>>>>>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>>>>>         at
>>>>> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
>>>>>         at
>>>>> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:141)
>>>>>         at
>>>>> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:201)
Nitin Pawar