Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hadoop with Sharded MySql


Copy link to this message
-
Re: Hadoop with Sharded MySql
Hi Sujith,

Srinivas is asking how to import data into HDFS using sqoop?  I believe he
must have thought out well before designing the entire
architecture/solution. He has not specified whether he would like to modify
the data or not. Whether to use HIve or HBase is a different question
altogether and depends on his use-case.

Thanks,
Anil
On Thu, May 31, 2012 at 9:52 PM, Sujit Dhamale <[EMAIL PROTECTED]>wrote:

> Hi ,
> instead of pulling 70K tables from mysql into hdfs.
> take dump of all 30 table and put in to hBase data base .
>
> if you pulled 70K tables from mysql into hdfs , you need to use Hive , but
> modification will not possible in Hive :(
>
> *@ common-user :* please correct me , if i am wrong .
>
> Kind Regards
> Sujit Dhamale
> (+91 9970086652)
> On Fri, Jun 1, 2012 at 5:42 AM, Edward Capriolo <[EMAIL PROTECTED]
> >wrote:
>
> > Maybe you can do some VIEWs or unions or merge tables on the mysql
> > side to overcome the aspect of launching so many sqoop jobs.
> >
> > On Thu, May 31, 2012 at 6:02 PM, Srinivas Surasani
> > <[EMAIL PROTECTED]> wrote:
> > > All,
> > >
> > > We are trying to implement sqoop in our environment which has 30 mysql
> > > sharded databases and all the databases have around 30 databases with
> > > 150 tables in each of the database which are all sharded (horizontally
> > > sharded that means the data is divided into all the tables in mysql).
> > >
> > > The problem is that we have a total of around 70K tables which needed
> > > to be pulled from mysql into hdfs.
> > >
> > > So, my question is that generating 70K sqoop commands and running them
> > > parallel is feasible or not?
> > >
> > > Also, doing incremental updates is going to be like invoking 70K
> > > another sqoop jobs which intern kick of map-reduce jobs.
> > >
> > > The main problem is monitoring and managing this huge number of jobs?
> > >
> > > Can anyone suggest me the best way of doing it or is sqoop a good
> > > candidate for this type of scenario?
> > >
> > > Currently the same process is done by generating tsv files  mysql
> > > server and dumped into staging server and  from there we'll generate
> > > hdfs put statements..
> > >
> > > Appreciate your suggestions !!!
> > >
> > >
> > > Thanks,
> > > Srinivas Surasani
> >
>

--
Thanks & Regards,
Anil Gupta
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB