|
|
Lashing 2013-03-14, 16:27
Hi, I have many small MySQL DBs (more than 1000) that need to transfer data into multiple (10) tables in one single hbase cluster concurrently. What will be the best recommended approach ? Thanks.
+
Lashing 2013-03-14, 16:27
-
Re: Bulkload or hbase API
Tariq 2013-03-14, 16:31
You might find Sqoop useful.
Lashing <[EMAIL PROTECTED]> wrote:
>Hi, >I have many small MySQL DBs (more than 1000) that need to transfer data >into multiple (10) tables in one single hbase cluster concurrently. >What will be the best recommended approach ? Thanks.
-- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
+
Tariq 2013-03-14, 16:31
-
Re: Bulkload or hbase API
Lashing 2013-03-14, 16:54
Hi Tariq, Thanks for the update. Can I run 1000 sqoop at the same time to one hbase ? Any contention may happen ? Thanks. Tariq <[EMAIL PROTECTED]> 於 2013/3/15 上午12:31 寫道:
> You might find Sqoop useful. > > Lashing <[EMAIL PROTECTED]> wrote: > >> Hi, >> I have many small MySQL DBs (more than 1000) that need to transfer data >> into multiple (10) tables in one single hbase cluster concurrently. >> What will be the best recommended approach ? Thanks. > > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity.
+
Lashing 2013-03-14, 16:54
-
Re: Bulkload or hbase API
Damien Hardy 2013-03-14, 16:36
Hello Lashing,
MapReduce would be great :
Each mapper addresses a different MySQL DB and "TableOutputFormat" to the corresponding HTable.
maybe pig : UNION after LOAD on different MySQL DB and then STORE on the différent table according to your policy (may need several M/R jobs all managed by pig workflow).
The more efficient (1 job) would be pure home made Java MapReduce (mapper only for each MySQL DB bulk loading on HTables)
Cheers,
-- Damien HARDY
+
Damien Hardy 2013-03-14, 16:36
-
Re: Bulkload or hbase API
Lashing 2013-03-14, 16:51
Hi Hardy, Thanks for the tip. Will multiple (1000) concurrent Bulkload cause contention in hbase ? How will hbase handle so many hfile at the same time ? Thanks.
Damien Hardy <[EMAIL PROTECTED]> 於 2013/3/15 上午12:36 寫道:
> Hello Lashing, > > MapReduce would be great : > > Each mapper addresses a different MySQL DB and "TableOutputFormat" to the > corresponding HTable. > > maybe pig : UNION after LOAD on different MySQL DB and then STORE on the > différent table according to your policy (may need several M/R jobs all > managed by pig workflow). > > The more efficient (1 job) would be pure home made Java MapReduce (mapper > only for each MySQL DB bulk loading on HTables) > > Cheers, > > -- > Damien HARDY
+
Lashing 2013-03-14, 16:51
-
Re: Bulkload or hbase API
Damien Hardy 2013-03-14, 17:04
Actually the concurency is limited by the number of map slots available in the Jobtracker (MR1). The last map tasks wait for the first ones to finish.
-- Damien HARDY
+
Damien Hardy 2013-03-14, 17:04
-
Re: Bulkload or hbase API
Lashing 2013-03-15, 16:00
Thanks, I will give it a try.
Damien Hardy <[EMAIL PROTECTED]> 於 2013/3/15 上午1:04 寫道:
> Actually the concurency is limited by the number of map slots available in > the Jobtracker (MR1). > The last map tasks wait for the first ones to finish. > > -- > Damien HARDY
+
Lashing 2013-03-15, 16:00
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext