Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase => replication => Hive


Copy link to this message
-
Re: HBase => replication => Hive
Hi,

 > So, you essentially want to dump HBase tables into sequence  files/RC

> files/text files and read it from Hive?

I think that's a Q for J-D.
I know that what I had in mind was not about creating periodic dumps because
that means data in Hive would always be behind data in HBase, but a more
real-time replication a la http://hbase.apache.org/replication.html except with
Hive being on the right side of that pretty picture.

> How do you plan to  handle updates, deletes, IVS etc if you use the log
> edits to replicate from  hbase to these files? Getting Hive to talk to
> HFiles gives you the same  problem.. Isn't it easier to take a snapshot
> of the table when you actually  want to run queries on it? In my prelim

The thing is, it looks like there is no way to take a snapshot of a HBase table:
http://blog.sematext.com/2011/03/11/hbase-backup-options/

> testing, I did see Hive-HBase full  table scans slower than direct Hive
> table scans but I don't remember the  numbers off hand.

This is what made me start this particular thread:
http://search-hadoop.com/m/rMdPh9rFlY1

Otis
> On Thu, Mar 10, 2011 at 10:43 PM, Otis  Gospodnetic
> <[EMAIL PROTECTED]>  wrote:
> >
> > Hi,
> >
> > Since HBase has a mechanism to  replicate edit logs to another HBase cluster,
>I
> > was wondering if people  think it would be possible to implement HBase=>Hive
> > replication? (and  really make the destination pluggable later on)
> >
> > I'm asking  because while one can integrate Hive and HBase by creating
>external
> >  tables in Hive that actually point to tables in HBase, apparently Hive  
>queries
> > run about x5 slower than queries that go against normal Hive  tables.
> >
> > And because all HBase export options are for 1 table at  a time and not point
>in
> > time snapshots of the whole table, exporting  data from HBase and importing
>into
> > Hive doesn't sound like a viable  option.
> >
> > Thanks,
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Hadoop
> > Hadoop ecosystem search :: http://search-hadoop.com/
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB