Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBase => replication => Hive


Copy link to this message
-
Re: HBase => replication => Hive
Otis Gospodnetic 2011-03-11, 19:13
Hi,
----- Original Message ----

> From: Andrew Purtell <[EMAIL PROTECTED]>
>
> Pardon, I'm not as familiar with this area as I should, but
>
> >  apparently Hive queries run about x5
> > slower than queries that go against  normal Hive tables.
>
> Is this not a reasonable place to start? Why is  this?

Reasonable?  I don't know. :)  That's really the first thing I was hoping to
find out.  J-Ds reaction makes it sound like this is not unreasonable.

> > I was wondering if people think it would be possible  to
> > implement HBase=>Hive replication?
>
> This strikes me as non  trivial. If doing this level of effort, why not look
>into the Hive/HBase  integration? Maybe there is something HBase can do to make
>it  faster?
At this point I don't know how trivial or non-trivial it is yet.  But I thought
that if John Sichi, who strikes me as a pretty smart fellow, says he's seeing x5
performance loss and he's the one who worked on the integration, getting from 5
to 4 or lower may be non-trivial.  HBase => Hive is terra incognita so, who
knows, maybe it's easy to do. :)

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
> Best regards,
>
>     - Andy
>
> Problems worthy  of attack prove their worth by hitting back.
>   - Piet Hein (via Tom  White)
>
>
> --- On Thu, 3/10/11, Otis Gospodnetic <[EMAIL PROTECTED]>  wrote:
>
> > From: Otis Gospodnetic <[EMAIL PROTECTED]>
> >  Subject: HBase => replication => Hive
> > To: [EMAIL PROTECTED]
> > Date:  Thursday, March 10, 2011, 10:43 PM
> > Hi,
> >
> > Since HBase has  a mechanism to replicate edit logs to
> > another HBase cluster, I was  wondering if people think it
> > would be possible to implement  HBase=>Hive
> > replication? (and really make the destination  pluggable
> > later on)
> >
> > I'm asking because while one can  integrate Hive and HBase
> > by creating external tables in Hive that  actually point to
> > tables in HBase, apparently Hive queries run about  x5
> > slower than queries that go against normal Hive tables.
> >
> > And because all HBase export options are for 1 table at a
> > time  and not point in time snapshots of the whole table,
> > exporting data from  HBase and importing into Hive doesn't
> > sound like a viable  option.
> >
> > Thanks,
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Hadoop
> > Hadoop ecosystem search :: http://search-hadoop.com/
> >
> >
>
>
>      
>