Srinivas Surasani 2012-05-31, 22:02
Edward Capriolo 2012-06-01, 00:12
instead of pulling 70K tables from mysql into hdfs.
take dump of all 30 table and put in to hBase data base .
if you pulled 70K tables from mysql into hdfs , you need to use Hive , but
modification will not possible in Hive :(
*@ common-user :* please correct me , if i am wrong .
On Fri, Jun 1, 2012 at 5:42 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote:
> Maybe you can do some VIEWs or unions or merge tables on the mysql
> side to overcome the aspect of launching so many sqoop jobs.
> On Thu, May 31, 2012 at 6:02 PM, Srinivas Surasani
> <[EMAIL PROTECTED]> wrote:
> > All,
> > We are trying to implement sqoop in our environment which has 30 mysql
> > sharded databases and all the databases have around 30 databases with
> > 150 tables in each of the database which are all sharded (horizontally
> > sharded that means the data is divided into all the tables in mysql).
> > The problem is that we have a total of around 70K tables which needed
> > to be pulled from mysql into hdfs.
> > So, my question is that generating 70K sqoop commands and running them
> > parallel is feasible or not?
> > Also, doing incremental updates is going to be like invoking 70K
> > another sqoop jobs which intern kick of map-reduce jobs.
> > The main problem is monitoring and managing this huge number of jobs?
> > Can anyone suggest me the best way of doing it or is sqoop a good
> > candidate for this type of scenario?
> > Currently the same process is done by generating tsv files mysql
> > server and dumped into staging server and from there we'll generate
> > hdfs put statements..
> > Appreciate your suggestions !!!
> > Thanks,
> > Srinivas Surasani
anil gupta 2012-06-01, 05:27
Srinivas Surasani 2012-06-01, 16:29
Michael Segel 2012-06-02, 00:09