We do something like that programmatically.
Read blobbed HBase data (qualifiers represent cross-sections such as
country_product and blob data such as clicks, impressions etc.)
We have several aggregation tasks (one per MySQL table) that aggregates the
data and inserts (in batches) to MySQL.
I don't know how much data you wanna scan and insert but we scan, aggregate
and insert approximately 7GB as ~12M lines from one HBase table into 9
MySQL tables and that takes a little bit less than 2 hours.
Our analysis shows that ~25% of that time is net HBase read and most of the
time is spent on MySQL inserts.
Since we are in the process of building a new system, optimizing is not in
our agenda but I would definitely try writing to csv and bulk loading into
Hope that helps.
On Wed, Nov 13, 2013 at 9:11 AM, Vincent Barat <[EMAIL PROTECTED]>wrote:
> We have done this kind of thing using HBase 0.92.1 + Pig, but we finally
> had to limit the size of the tables and move the biggest data to HDFS:
> loading data directly from HBase is much slower than from HDFS, and doing
> it using M/R overloads HBase region servers, since several maps jobs scan
> table regions at the same time: so the bigger your tables are, the higher
> the load is (usually Pig creates 1 map per region, I don't know about Hive).
> This may not be an issue if your HBase cluster is dedicated to this kind
> of job, but if you also have to ensure a good random read latency at the
> same time, forget it.
> Le 11/11/2013 13:10, JC a écrit :
> We are looking to use hbase as a transformation engine. In other words,
>> data already loaded into hbase, run some large calculation/aggregation on
>> that data and then load it back into a rdbms for our BI analytic tools to
>> use. I was curious about what the communities experience is on this and if
>> there are some best practices. Some thoughts we are kicking around is
>> Mapreduce 2 and Yarn and writing files to HDFS to be loaded into the
>> Not sure what all the pieces are needed for the complete application
>> Thanks in advance for your help,
>> View this message in context: http://apache-hbase.679495.n3.
>> Sent from the HBase User mailing list archive at Nabble.com.