We have done this kind of thing using HBase 0.92.1 + Pig, but we
finally had to limit the size of the tables and move the biggest
data to HDFS: loading data directly from HBase is much slower than
from HDFS, and doing it using M/R overloads HBase region servers,
since several maps jobs scan table regions at the same time: so the
bigger your tables are, the higher the load is (usually Pig creates
1 map per region, I don't know about Hive).
This may not be an issue if your HBase cluster is dedicated to this
kind of job, but if you also have to ensure a good random read
latency at the same time, forget it.
Le 11/11/2013 13:10, JC a ï¿½crit :
> We are looking to use hbase as a transformation engine. In other words, take
> data already loaded into hbase, run some large calculation/aggregation on
> that data and then load it back into a rdbms for our BI analytic tools to
> use. I was curious about what the communities experience is on this and if
> there are some best practices. Some thoughts we are kicking around is using
> Mapreduce 2 and Yarn and writing files to HDFS to be loaded into the rdbms.
> Not sure what all the pieces are needed for the complete application though.
> Thanks in advance for your help,
> View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-as-a-transformation-engine-tp4052670.html
> Sent from the HBase User mailing list archive at Nabble.com.