HBase >> mail # user >> HBase => replication => Hive

Re: HBase => replication => Hive

 > So, you essentially want to dump HBase tables into sequence  files/RC

> files/text files and read it from Hive?

I think that's a Q for J-D.
I know that what I had in mind was not about creating periodic dumps because
that means data in Hive would always be behind data in HBase, but a more
real-time replication a la http://hbase.apache.org/replication.html except with
Hive being on the right side of that pretty picture.

> How do you plan to  handle updates, deletes, IVS etc if you use the log
> edits to replicate from  hbase to these files? Getting Hive to talk to
> HFiles gives you the same  problem.. Isn't it easier to take a snapshot
> of the table when you actually  want to run queries on it? In my prelim

The thing is, it looks like there is no way to take a snapshot of a HBase table:

> testing, I did see Hive-HBase full  table scans slower than direct Hive
> table scans but I don't remember the  numbers off hand.

This is what made me start this particular thread:

> On Thu, Mar 10, 2011 at 10:43 PM, Otis  Gospodnetic
> <[EMAIL PROTECTED]>  wrote:
> >
> > Hi,
> >
> > Since HBase has a mechanism to  replicate edit logs to another HBase cluster,
> > was wondering if people  think it would be possible to implement HBase=>Hive
> > replication? (and  really make the destination pluggable later on)
> >
> > I'm asking  because while one can integrate Hive and HBase by creating
> >  tables in Hive that actually point to tables in HBase, apparently Hive  
> > run about x5 slower than queries that go against normal Hive  tables.
> >
> > And because all HBase export options are for 1 table at  a time and not point
> > time snapshots of the whole table, exporting  data from HBase and importing
> > Hive doesn't sound like a viable  option.
> >
> > Thanks,
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Hadoop
> > Hadoop ecosystem search :: http://search-hadoop.com/
> >