-Re: HBase => replication => Hive
Otis Gospodnetic 2011-03-11, 19:09
> So, you essentially want to dump HBase tables into sequence files/RC
> files/text files and read it from Hive?
I think that's a Q for J-D.
I know that what I had in mind was not about creating periodic dumps because
that means data in Hive would always be behind data in HBase, but a more
real-time replication a la http://hbase.apache.org/replication.html except with
Hive being on the right side of that pretty picture.
> How do you plan to handle updates, deletes, IVS etc if you use the log
> edits to replicate from hbase to these files? Getting Hive to talk to
> HFiles gives you the same problem.. Isn't it easier to take a snapshot
> of the table when you actually want to run queries on it? In my prelim
The thing is, it looks like there is no way to take a snapshot of a HBase table:
> testing, I did see Hive-HBase full table scans slower than direct Hive
> table scans but I don't remember the numbers off hand.
This is what made me start this particular thread:
> On Thu, Mar 10, 2011 at 10:43 PM, Otis Gospodnetic
> <[EMAIL PROTECTED]> wrote:
> > Hi,
> > Since HBase has a mechanism to replicate edit logs to another HBase cluster,
> > was wondering if people think it would be possible to implement HBase=>Hive
> > replication? (and really make the destination pluggable later on)
> > I'm asking because while one can integrate Hive and HBase by creating
> > tables in Hive that actually point to tables in HBase, apparently Hive
> > run about x5 slower than queries that go against normal Hive tables.
> > And because all HBase export options are for 1 table at a time and not point
> > time snapshots of the whole table, exporting data from HBase and importing
> > Hive doesn't sound like a viable option.
> > Thanks,
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop
> > Hadoop ecosystem search :: http://search-hadoop.com/