shengjie min 2013-08-03, 21:14
Sqoop is capable of performing incremental updates (
You can also import into HBase (
Sqoop should be able to update a single table for all three databases, but
you'll need to make sure that the row keys sqoop generates don't overlap.
Also, you'll likely have to manage '--last-value'
I highly recommend testing such a setup first and reporting back with your
On Sat, Aug 3, 2013 at 2:14 PM, shengjie min <[EMAIL PROTECTED]> wrote:
> Hi All,
> I've asked this question in HBase mailing list, people suggested me better
> off ask it here :) so here I am. I am new to sqoop and having a use case
> where there is a few applications running in house independently, Let's say
> applications A, B, C. Each has its own DB associated. I wanna create a
> aggregated view on all the databases so that I don't have to jump into
> different dbs to find the info I need. Simply example will be all three
> applications have a table called "users", they are v similar, I wanna union
> the "users" table.
> I've had a look at sqoop, looks like it allows me to move data from
> database A,B,C to a single/centralised place - e.g. HBase?
> The solution I am looking for ideally need to do the followings:
> 1. the centralised storage keeps updated reasonably quick as the original
> db (A, B, C) gets updated. By all means, I am not looking for one time bulk
> import, I wanna have incremental updates after the initial import.
> 2. As long as I provide a schema mapping, Can A,B,C be imported to a
> single place, e.g. single HBase table.
> now, my question is:
> Is Sqoop a suitable tool for this? I was originally considering to use
> mangodb and write the periodic/parallel import piece myself. But for now, I
> am leaning towards sqoop more since in house we have hadoop running
> already. Any advices are highly appreciated!
Kathleen Ting 2013-08-05, 19:09