|
|
-
Another sqoop incremental update question
Chalcy 2012-10-23, 13:44
Hello sqoop users,
Sqoop incremental append for insert and update works really great. Is there anyway to handle deletes? I am planning to do it by left outer join but trying to find if there is any other way.
Thanks, Chalcy
+
Chalcy 2012-10-23, 13:44
-
Re: Another sqoop incremental update question
Jarek Jarcec Cecho 2012-10-23, 15:02
Hi Chalcy, I'm afraid that there isn't a way how to achieve deletes from withing Sqoop.
Just a quick question. It seems to me that you're trying to import data to HDFS, do some transformations and put the data back to your database (using updates, inserts and deletes). In case that I do understand your use case correctly, I would propose to truncate the table after your input and use simple export to load updated data. I believe that such approach will be faster than selective inserts, updates and deletes.
Jarcec
On Tue, Oct 23, 2012 at 09:44:04AM -0400, Chalcy wrote: > Hello sqoop users, > > Sqoop incremental append for insert and update works really great. Is > there anyway to handle deletes? I am planning to do it by left outer join > but trying to find if there is any other way. > > Thanks, > Chalcy
+
Jarek Jarcec Cecho 2012-10-23, 15:02
-
Re: Another sqoop incremental update question
Chalcy 2012-10-23, 16:11
Hi Jarec,
I split the questions into 2, actually trying to achieve one objective.
The usecase is not to export back to db. For huge tables, do one time pull, then increment append based on modified date to a new table, merge both so I get the updated rows. I am using left outer join efficiently, but would like to try sqoop merge, if it is easy as to just give, input, incremented table and be able to merge.
Also some rows would have been deleted in the database when we do the incremental update to the hive table. I should be able to delete the rows. The way I handle is to get all the ids(unique id) only from the database and do another outer join, so the database deleted rows will not be in the merged hive table.
Thanks, Jarec, Chalcy On Tue, Oct 23, 2012 at 11:02 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:
> Hi Chalcy, > I'm afraid that there isn't a way how to achieve deletes from withing > Sqoop. > > Just a quick question. It seems to me that you're trying to import data to > HDFS, do some transformations and put the data back to your database (using > updates, inserts and deletes). In case that I do understand your use case > correctly, I would propose to truncate the table after your input and use > simple export to load updated data. I believe that such approach will be > faster than selective inserts, updates and deletes. > > Jarcec > > On Tue, Oct 23, 2012 at 09:44:04AM -0400, Chalcy wrote: > > Hello sqoop users, > > > > Sqoop incremental append for insert and update works really great. Is > > there anyway to handle deletes? I am planning to do it by left outer > join > > but trying to find if there is any other way. > > > > Thanks, > > Chalcy >
+
Chalcy 2012-10-23, 16:11
-
Re: Another sqoop incremental update question
Jarek Jarcec Cecho 2012-10-23, 17:03
Hi Chalcy, thank you for explaining your use case. I do have an idea what you're trying to achieve now. I'm afraid that merge won't do the delete magic for you either. The hive queries seems as a reasonable solution to me.
Jarcec
On Tue, Oct 23, 2012 at 12:11:19PM -0400, Chalcy wrote: > Hi Jarec, > > I split the questions into 2, actually trying to achieve one objective. > > The usecase is not to export back to db. For huge tables, do one time > pull, then increment append based on modified date to a new table, merge > both so I get the updated rows. I am using left outer join efficiently, but > would like to try sqoop merge, if it is easy as to just give, input, > incremented table and be able to merge. > > Also some rows would have been deleted in the database when we do the > incremental update to the hive table. I should be able to delete the rows. > The way I handle is to get all the ids(unique id) only from the database > and do another outer join, so the database deleted rows will not be in the > merged hive table. > > Thanks, Jarec, > Chalcy > > > On Tue, Oct 23, 2012 at 11:02 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > > > Hi Chalcy, > > I'm afraid that there isn't a way how to achieve deletes from withing > > Sqoop. > > > > Just a quick question. It seems to me that you're trying to import data to > > HDFS, do some transformations and put the data back to your database (using > > updates, inserts and deletes). In case that I do understand your use case > > correctly, I would propose to truncate the table after your input and use > > simple export to load updated data. I believe that such approach will be > > faster than selective inserts, updates and deletes. > > > > Jarcec > > > > On Tue, Oct 23, 2012 at 09:44:04AM -0400, Chalcy wrote: > > > Hello sqoop users, > > > > > > Sqoop incremental append for insert and update works really great. Is > > > there anyway to handle deletes? I am planning to do it by left outer > > join > > > but trying to find if there is any other way. > > > > > > Thanks, > > > Chalcy > >
+
Jarek Jarcec Cecho 2012-10-23, 17:03
|
|