Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase as a transformation engine


Copy link to this message
-
Re: HBase as a transformation engine
Are you reading using HBase client or do you have an inputFormat for
reading HFiles?

On Wednesday, November 13, 2013, Amit Sela wrote:

> Hi,
>
> We do something like that programmatically.
> Read blobbed HBase data (qualifiers represent cross-sections such as
> country_product and blob data such as clicks, impressions etc.)
> We have several aggregation tasks (one per MySQL table) that aggregates the
> data and inserts (in batches) to MySQL.
> I don't know how much data you wanna scan and insert but we scan, aggregate
> and insert approximately 7GB as ~12M lines from one HBase table into 9
> MySQL tables and that takes a little bit less than 2 hours.
> Our analysis shows that ~25% of that time is net HBase read and most of the
> time is spent on MySQL inserts.
> Since we are in the process of building a new system, optimizing is not in
> our agenda but I would definitely try writing to csv and bulk loading into
> RDBMS.
>
> Hope that helps.
>
>
>
>
> On Wed, Nov 13, 2013 at 9:11 AM, Vincent Barat <[EMAIL PROTECTED]<javascript:;>
> >wrote:
>
> > Hi,
> >
> > We have done this kind of thing using HBase 0.92.1 + Pig, but we finally
> > had to limit the size of the tables and move the biggest data to HDFS:
> > loading data directly from HBase is much slower than from HDFS, and doing
> > it using M/R overloads HBase region servers, since several maps jobs scan
> > table regions at the same time: so the bigger your tables are, the higher
> > the load is (usually Pig creates 1 map per region, I don't know about
> Hive).
> >
> > This may not be an issue if your HBase cluster is dedicated to this kind
> > of job, but if you also have to ensure a good random read latency at the
> > same time, forget it.
> >
> > Regards,
> >
> > Le 11/11/2013 13:10, JC a écrit :
> >
> >  We are looking to use hbase as a transformation engine. In other words,
> >> take
> >> data already loaded into hbase, run some large calculation/aggregation
> on
> >> that data and then load it back into a rdbms for our BI analytic tools
> to
> >> use. I was curious about what the communities experience is on this and
> if
> >> there are some best practices. Some thoughts we are kicking around is
> >> using
> >> Mapreduce 2 and Yarn and writing files to HDFS to be loaded into the
> >> rdbms.
> >> Not sure what all the pieces are needed for the complete application
> >> though.
> >>
> >> Thanks in advance for your help,
> >> JC
> >>
> >>
> >>
> >> --
> >> View this message in context: http://apache-hbase.679495.n3.
> >> nabble.com/HBase-as-a-transformation-engine-tp4052670.html
> >> Sent from the HBase User mailing list archive at Nabble.com.
> >>
> >>
>