Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase as a transformation engine


+
JC 2013-11-11, 12:10
+
Vincent Barat 2013-11-13, 07:11
+
Jia Wang 2013-11-13, 09:17
Copy link to this message
-
Re: HBase as a transformation engine
Hi,

We do something like that programmatically.
Read blobbed HBase data (qualifiers represent cross-sections such as
country_product and blob data such as clicks, impressions etc.)
We have several aggregation tasks (one per MySQL table) that aggregates the
data and inserts (in batches) to MySQL.
I don't know how much data you wanna scan and insert but we scan, aggregate
and insert approximately 7GB as ~12M lines from one HBase table into 9
MySQL tables and that takes a little bit less than 2 hours.
Our analysis shows that ~25% of that time is net HBase read and most of the
time is spent on MySQL inserts.
Since we are in the process of building a new system, optimizing is not in
our agenda but I would definitely try writing to csv and bulk loading into
RDBMS.

Hope that helps.
On Wed, Nov 13, 2013 at 9:11 AM, Vincent Barat <[EMAIL PROTECTED]>wrote:

> Hi,
>
> We have done this kind of thing using HBase 0.92.1 + Pig, but we finally
> had to limit the size of the tables and move the biggest data to HDFS:
> loading data directly from HBase is much slower than from HDFS, and doing
> it using M/R overloads HBase region servers, since several maps jobs scan
> table regions at the same time: so the bigger your tables are, the higher
> the load is (usually Pig creates 1 map per region, I don't know about Hive).
>
> This may not be an issue if your HBase cluster is dedicated to this kind
> of job, but if you also have to ensure a good random read latency at the
> same time, forget it.
>
> Regards,
>
> Le 11/11/2013 13:10, JC a écrit :
>
>  We are looking to use hbase as a transformation engine. In other words,
>> take
>> data already loaded into hbase, run some large calculation/aggregation on
>> that data and then load it back into a rdbms for our BI analytic tools to
>> use. I was curious about what the communities experience is on this and if
>> there are some best practices. Some thoughts we are kicking around is
>> using
>> Mapreduce 2 and Yarn and writing files to HDFS to be loaded into the
>> rdbms.
>> Not sure what all the pieces are needed for the complete application
>> though.
>>
>> Thanks in advance for your help,
>> JC
>>
>>
>>
>> --
>> View this message in context: http://apache-hbase.679495.n3.
>> nabble.com/HBase-as-a-transformation-engine-tp4052670.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
+
Asaf Mesika 2013-11-18, 18:56