Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Using Sqoop to merge/union databases


Copy link to this message
-
Re: Using Sqoop to merge/union databases
Hey There,

Sqoop is capable of performing incremental updates (
http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_incremental_imports).
You can also import into HBase (
http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_importing_data_into_hbase
).

Sqoop should be able to update a single table for all three databases, but
you'll need to make sure that the row keys sqoop generates don't overlap.
Also, you'll likely have to manage '--last-value'

I highly recommend testing such a setup first and reporting back with your
findings!

-Abe
On Sat, Aug 3, 2013 at 2:14 PM, shengjie min <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> I've asked this question in HBase mailing list, people suggested me better
> off ask it here :) so here I am. I am new to sqoop and having a use case
> where there is a few applications running in house independently, Let's say
> applications A, B, C. Each has its own DB associated. I wanna create a
> aggregated view on all the databases so that I don't have to jump into
> different dbs to find the info I need. Simply example will be all three
> applications have a table called "users", they are v similar, I wanna union
> the "users" table.
>
> I've had a look at sqoop, looks like it allows me to move data from
> database A,B,C to a single/centralised place - e.g. HBase?
>
> The solution I am looking for ideally need to do the followings:
>
> 1. the centralised storage keeps updated reasonably quick as the original
> db (A, B, C) gets updated. By all means, I am not looking for one time bulk
> import, I wanna have incremental updates after the initial import.
> 2. As long as I provide a schema mapping, Can A,B,C be imported to a
> single place, e.g. single HBase table.
>
> now, my question is:
>
> Is Sqoop a suitable tool for this? I was originally considering to use
> mangodb and write the periodic/parallel import piece myself. But for now, I
> am leaning towards sqoop more since in house we have hadoop running
> already. Any advices are highly appreciated!
>
> Thanks,
> Shengjie
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB