-Re: HBase as a transformation engine
Asaf Mesika 2013-11-18, 18:56
Are you reading using HBase client or do you have an inputFormat for
On Wednesday, November 13, 2013, Amit Sela wrote:
> We do something like that programmatically.
> Read blobbed HBase data (qualifiers represent cross-sections such as
> country_product and blob data such as clicks, impressions etc.)
> We have several aggregation tasks (one per MySQL table) that aggregates the
> data and inserts (in batches) to MySQL.
> I don't know how much data you wanna scan and insert but we scan, aggregate
> and insert approximately 7GB as ~12M lines from one HBase table into 9
> MySQL tables and that takes a little bit less than 2 hours.
> Our analysis shows that ~25% of that time is net HBase read and most of the
> time is spent on MySQL inserts.
> Since we are in the process of building a new system, optimizing is not in
> our agenda but I would definitely try writing to csv and bulk loading into
> Hope that helps.
> > Hi,
> > We have done this kind of thing using HBase 0.92.1 + Pig, but we finally
> > had to limit the size of the tables and move the biggest data to HDFS:
> > loading data directly from HBase is much slower than from HDFS, and doing
> > it using M/R overloads HBase region servers, since several maps jobs scan
> > table regions at the same time: so the bigger your tables are, the higher
> > the load is (usually Pig creates 1 map per region, I don't know about
> > This may not be an issue if your HBase cluster is dedicated to this kind
> > of job, but if you also have to ensure a good random read latency at the
> > same time, forget it.
> > Regards,
> > Le 11/11/2013 13:10, JC a écrit :
> > We are looking to use hbase as a transformation engine. In other words,
> >> take
> >> data already loaded into hbase, run some large calculation/aggregation
> >> that data and then load it back into a rdbms for our BI analytic tools
> >> use. I was curious about what the communities experience is on this and
> >> there are some best practices. Some thoughts we are kicking around is
> >> using
> >> Mapreduce 2 and Yarn and writing files to HDFS to be loaded into the
> >> rdbms.
> >> Not sure what all the pieces are needed for the complete application
> >> though.
> >> Thanks in advance for your help,
> >> JC
> >> --
> >> View this message in context: http://apache-hbase.679495.n3.
> >> nabble.com/HBase-as-a-transformation-engine-tp4052670.html
> >> Sent from the HBase User mailing list archive at Nabble.com.