Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hadoop with Sharded MySql


Copy link to this message
-
Re: Hadoop with Sharded MySql
Hi Sujith,

Srinivas is asking how to import data into HDFS using sqoop?  I believe he
must have thought out well before designing the entire
architecture/solution. He has not specified whether he would like to modify
the data or not. Whether to use HIve or HBase is a different question
altogether and depends on his use-case.

Thanks,
Anil
On Thu, May 31, 2012 at 9:52 PM, Sujit Dhamale <[EMAIL PROTECTED]>wrote:

> Hi ,
> instead of pulling 70K tables from mysql into hdfs.
> take dump of all 30 table and put in to hBase data base .
>
> if you pulled 70K tables from mysql into hdfs , you need to use Hive , but
> modification will not possible in Hive :(
>
> *@ common-user :* please correct me , if i am wrong .
>
> Kind Regards
> Sujit Dhamale
> (+91 9970086652)
> On Fri, Jun 1, 2012 at 5:42 AM, Edward Capriolo <[EMAIL PROTECTED]
> >wrote:
>
> > Maybe you can do some VIEWs or unions or merge tables on the mysql
> > side to overcome the aspect of launching so many sqoop jobs.
> >
> > On Thu, May 31, 2012 at 6:02 PM, Srinivas Surasani
> > <[EMAIL PROTECTED]> wrote:
> > > All,
> > >
> > > We are trying to implement sqoop in our environment which has 30 mysql
> > > sharded databases and all the databases have around 30 databases with
> > > 150 tables in each of the database which are all sharded (horizontally
> > > sharded that means the data is divided into all the tables in mysql).
> > >
> > > The problem is that we have a total of around 70K tables which needed
> > > to be pulled from mysql into hdfs.
> > >
> > > So, my question is that generating 70K sqoop commands and running them
> > > parallel is feasible or not?
> > >
> > > Also, doing incremental updates is going to be like invoking 70K
> > > another sqoop jobs which intern kick of map-reduce jobs.
> > >
> > > The main problem is monitoring and managing this huge number of jobs?
> > >
> > > Can anyone suggest me the best way of doing it or is sqoop a good
> > > candidate for this type of scenario?
> > >
> > > Currently the same process is done by generating tsv files  mysql
> > > server and dumped into staging server and  from there we'll generate
> > > hdfs put statements..
> > >
> > > Appreciate your suggestions !!!
> > >
> > >
> > > Thanks,
> > > Srinivas Surasani
> >
>

--
Thanks & Regards,
Anil Gupta