Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase => replication => Hive


Copy link to this message
-
Re: HBase => replication => Hive
Hi,
----- Original Message ----

> From: Andrew Purtell <[EMAIL PROTECTED]>
>
> Pardon, I'm not as familiar with this area as I should, but
>
> >  apparently Hive queries run about x5
> > slower than queries that go against  normal Hive tables.
>
> Is this not a reasonable place to start? Why is  this?

Reasonable?  I don't know. :)  That's really the first thing I was hoping to
find out.  J-Ds reaction makes it sound like this is not unreasonable.

> > I was wondering if people think it would be possible  to
> > implement HBase=>Hive replication?
>
> This strikes me as non  trivial. If doing this level of effort, why not look
>into the Hive/HBase  integration? Maybe there is something HBase can do to make
>it  faster?
At this point I don't know how trivial or non-trivial it is yet.  But I thought
that if John Sichi, who strikes me as a pretty smart fellow, says he's seeing x5
performance loss and he's the one who worked on the integration, getting from 5
to 4 or lower may be non-trivial.  HBase => Hive is terra incognita so, who
knows, maybe it's easy to do. :)

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
> Best regards,
>
>     - Andy
>
> Problems worthy  of attack prove their worth by hitting back.
>   - Piet Hein (via Tom  White)
>
>
> --- On Thu, 3/10/11, Otis Gospodnetic <[EMAIL PROTECTED]>  wrote:
>
> > From: Otis Gospodnetic <[EMAIL PROTECTED]>
> >  Subject: HBase => replication => Hive
> > To: [EMAIL PROTECTED]
> > Date:  Thursday, March 10, 2011, 10:43 PM
> > Hi,
> >
> > Since HBase has  a mechanism to replicate edit logs to
> > another HBase cluster, I was  wondering if people think it
> > would be possible to implement  HBase=>Hive
> > replication? (and really make the destination  pluggable
> > later on)
> >
> > I'm asking because while one can  integrate Hive and HBase
> > by creating external tables in Hive that  actually point to
> > tables in HBase, apparently Hive queries run about  x5
> > slower than queries that go against normal Hive tables.
> >
> > And because all HBase export options are for 1 table at a
> > time  and not point in time snapshots of the whole table,
> > exporting data from  HBase and importing into Hive doesn't
> > sound like a viable  option.
> >
> > Thanks,
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Hadoop
> > Hadoop ecosystem search :: http://search-hadoop.com/
> >
> >
>
>
>      
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB