Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Another sqoop incremental update question


Copy link to this message
-
Re: Another sqoop incremental update question
Hi Jarec,

I split the questions into 2, actually trying to achieve one objective.

The usecase is not to export back to db.  For huge tables, do one time
pull, then increment append based on modified date to a new table, merge
both so I get the updated rows. I am using left outer join efficiently, but
would like to try sqoop merge, if it is easy as to just give, input,
incremented table and be able to merge.

Also some rows would have been deleted in the database when we do the
incremental update to the hive table.  I should be able to delete the rows.
 The way I handle is to get all the ids(unique id) only from the database
and do another outer join, so the database deleted rows will not be in the
merged hive table.

Thanks, Jarec,
Chalcy
On Tue, Oct 23, 2012 at 11:02 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:

> Hi Chalcy,
> I'm afraid that there isn't a way how to achieve deletes from withing
> Sqoop.
>
> Just a quick question. It seems to me that you're trying to import data to
> HDFS, do some transformations and put the data back to your database (using
> updates, inserts and deletes). In case that I do understand your use case
> correctly, I would propose to truncate the table after your input and use
> simple export to load updated data. I believe that such approach will be
> faster than selective inserts, updates and deletes.
>
> Jarcec
>
> On Tue, Oct 23, 2012 at 09:44:04AM -0400, Chalcy wrote:
> > Hello sqoop users,
> >
> > Sqoop incremental append for insert and update works really great.  Is
> > there anyway to handle deletes?  I am planning to do it by left outer
> join
> > but trying to find if there is any other way.
> >
> > Thanks,
> > Chalcy
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB