Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase as a transformation engine


Copy link to this message
-
Re: HBase as a transformation engine
Hi,

We do something like that programmatically.
Read blobbed HBase data (qualifiers represent cross-sections such as
country_product and blob data such as clicks, impressions etc.)
We have several aggregation tasks (one per MySQL table) that aggregates the
data and inserts (in batches) to MySQL.
I don't know how much data you wanna scan and insert but we scan, aggregate
and insert approximately 7GB as ~12M lines from one HBase table into 9
MySQL tables and that takes a little bit less than 2 hours.
Our analysis shows that ~25% of that time is net HBase read and most of the
time is spent on MySQL inserts.
Since we are in the process of building a new system, optimizing is not in
our agenda but I would definitely try writing to csv and bulk loading into
RDBMS.

Hope that helps.
On Wed, Nov 13, 2013 at 9:11 AM, Vincent Barat <[EMAIL PROTECTED]>wrote:

> Hi,
>
> We have done this kind of thing using HBase 0.92.1 + Pig, but we finally
> had to limit the size of the tables and move the biggest data to HDFS:
> loading data directly from HBase is much slower than from HDFS, and doing
> it using M/R overloads HBase region servers, since several maps jobs scan
> table regions at the same time: so the bigger your tables are, the higher
> the load is (usually Pig creates 1 map per region, I don't know about Hive).
>
> This may not be an issue if your HBase cluster is dedicated to this kind
> of job, but if you also have to ensure a good random read latency at the
> same time, forget it.
>
> Regards,
>
> Le 11/11/2013 13:10, JC a écrit :
>
>  We are looking to use hbase as a transformation engine. In other words,
>> take
>> data already loaded into hbase, run some large calculation/aggregation on
>> that data and then load it back into a rdbms for our BI analytic tools to
>> use. I was curious about what the communities experience is on this and if
>> there are some best practices. Some thoughts we are kicking around is
>> using
>> Mapreduce 2 and Yarn and writing files to HDFS to be loaded into the
>> rdbms.
>> Not sure what all the pieces are needed for the complete application
>> though.
>>
>> Thanks in advance for your help,
>> JC
>>
>>
>>
>> --
>> View this message in context: http://apache-hbase.679495.n3.
>> nabble.com/HBase-as-a-transformation-engine-tp4052670.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB