Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Performance: hive+hbase integration query against the row_key


+
Shengjie Min 2012-09-11, 13:40
+
bharath vissapragada 2012-09-11, 14:00
Copy link to this message
-
Re: Performance: hive+hbase integration query against the row_key
Alan Gates 2012-09-12, 01:20

On Sep 11, 2012, at 7:00 AM, bharath vissapragada wrote:

> Hey,
>
> Hive does all kinds of parsing , metadata lookups, query tree building and stuff before executing the query. Not sure if this all was included in those 36 seconds !
>
> Also what hive does is, it builds a scan object with ranges based on predicates (and mappers too ) on key column and not a direct "get" call as in hbase shell. This might incur some overhead too!

Since Hive does this in a MapReduce job it definitely incurs overhead.  It does not run directly against HBase as you might wish it did here.

Alan.

>
> On Tue, Sep 11, 2012 at 7:10 PM, Shengjie Min <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I am trying to get hive working on top of my hbase table following the guide below:
> https://cwiki.apache.org/Hive/hbaseintegration.html
>
> CREATE EXTERNAL TABLE hive_hbase_test (key string, a string, b string, c string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES
> ("hbase.columns.mapping"=":key,cf:a,cf:b,cf:c") TBLPROPERTIES ("hbase.table.name"="test");
>
> this hive table creation makes my mapping roughly look like this:
>
> hive_hbase_test  VS   test
> Hive key  -   hbase row_key
> Hive column a -  hbase cf:a
> Hive column b  -  hbase cf:b
> Hive column c  -  hbase cf:c
>
> From my understanding on how HBaseStorageHandler works, it's supposed to take advantage of the hbase row_key index as much as possible. So I would expect,
>
> 1. if you do a hive query against the row key like "select * from hive_hbase_test where key='blabla'", this would utilize the hbase row_key index which give you very quick nearly real-time response just like hbase does.
>
> 2. of coz, if you do a hive query against a column like "select * from hive_hbase_test where a='blabla'", in this case, it queries against a specific column, it probably uses mapred because there is nothing from Hbase side can be utilized.
>
> From my test, query 1 doesn't seem fast at all, still taking ages, so
> select * from hive_hbase_test where key='blabla'   36secs
> vs
> get 'test', 'blabla'      less than 1 sec
> still shows a huge difference.
>
> Anybody has tried this before? Is there anyway I can do sort of query plan analysis against hive query? or I am not mapping hive table against hbase table correctly?
>
> --
> All the best,
> Shengjie Min
>
>
>
>
> --
> Regards,
> Bharath .V
> w:http://researchweb.iiit.ac.in/~bharath.v
+
ashok.samal@... 2012-09-12, 03:25
+
Bejoy KS 2012-09-12, 21:09
+
ashok.samal@... 2012-09-12, 21:12
+
Bejoy KS 2012-09-12, 21:33