Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Using Sqoop to merge/union databases


Copy link to this message
-
Re: Using Sqoop to merge/union databases
Hey There,

Sqoop is capable of performing incremental updates (
http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_incremental_imports).
You can also import into HBase (
http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_importing_data_into_hbase
).

Sqoop should be able to update a single table for all three databases, but
you'll need to make sure that the row keys sqoop generates don't overlap.
Also, you'll likely have to manage '--last-value'

I highly recommend testing such a setup first and reporting back with your
findings!

-Abe
On Sat, Aug 3, 2013 at 2:14 PM, shengjie min <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> I've asked this question in HBase mailing list, people suggested me better
> off ask it here :) so here I am. I am new to sqoop and having a use case
> where there is a few applications running in house independently, Let's say
> applications A, B, C. Each has its own DB associated. I wanna create a
> aggregated view on all the databases so that I don't have to jump into
> different dbs to find the info I need. Simply example will be all three
> applications have a table called "users", they are v similar, I wanna union
> the "users" table.
>
> I've had a look at sqoop, looks like it allows me to move data from
> database A,B,C to a single/centralised place - e.g. HBase?
>
> The solution I am looking for ideally need to do the followings:
>
> 1. the centralised storage keeps updated reasonably quick as the original
> db (A, B, C) gets updated. By all means, I am not looking for one time bulk
> import, I wanna have incremental updates after the initial import.
> 2. As long as I provide a schema mapping, Can A,B,C be imported to a
> single place, e.g. single HBase table.
>
> now, my question is:
>
> Is Sqoop a suitable tool for this? I was originally considering to use
> mangodb and write the periodic/parallel import piece myself. But for now, I
> am leaning towards sqoop more since in house we have hadoop running
> already. Any advices are highly appreciated!
>
> Thanks,
> Shengjie