So, you essentially want to dump HBase tables into sequence files/RC
files/text files and read it from Hive?
How do you plan to handle updates, deletes, IVS etc if you use the log
edits to replicate from hbase to these files? Getting Hive to talk to
HFiles gives you the same problem.. Isn't it easier to take a snapshot
of the table when you actually want to run queries on it? In my prelim
testing, I did see Hive-HBase full table scans slower than direct Hive
table scans but I don't remember the numbers off hand.
On Thu, Mar 10, 2011 at 10:43 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Since HBase has a mechanism to replicate edit logs to another HBase cluster, I
> was wondering if people think it would be possible to implement HBase=>Hive
> replication? (and really make the destination pluggable later on)
> I'm asking because while one can integrate Hive and HBase by creating external
> tables in Hive that actually point to tables in HBase, apparently Hive queries
> run about x5 slower than queries that go against normal Hive tables.
> And because all HBase export options are for 1 table at a time and not point in
> time snapshots of the whole table, exporting data from HBase and importing into
> Hive doesn't sound like a viable option.
> Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop
> Hadoop ecosystem search :: http://search-hadoop.com/