Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hadoop with Sharded MySql


Copy link to this message
-
Re: Hadoop with Sharded MySql
Maybe you can do some VIEWs or unions or merge tables on the mysql
side to overcome the aspect of launching so many sqoop jobs.

On Thu, May 31, 2012 at 6:02 PM, Srinivas Surasani
<[EMAIL PROTECTED]> wrote:
> All,
>
> We are trying to implement sqoop in our environment which has 30 mysql
> sharded databases and all the databases have around 30 databases with
> 150 tables in each of the database which are all sharded (horizontally
> sharded that means the data is divided into all the tables in mysql).
>
> The problem is that we have a total of around 70K tables which needed
> to be pulled from mysql into hdfs.
>
> So, my question is that generating 70K sqoop commands and running them
> parallel is feasible or not?
>
> Also, doing incremental updates is going to be like invoking 70K
> another sqoop jobs which intern kick of map-reduce jobs.
>
> The main problem is monitoring and managing this huge number of jobs?
>
> Can anyone suggest me the best way of doing it or is sqoop a good
> candidate for this type of scenario?
>
> Currently the same process is done by generating tsv files  mysql
> server and dumped into staging server and  from there we'll generate
> hdfs put statements..
>
> Appreciate your suggestions !!!
>
>
> Thanks,
> Srinivas Surasani
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB