Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Performance: hive+hbase integration query against the row_key


Copy link to this message
-
Re: Performance: hive+hbase integration query against the row_key

On Sep 11, 2012, at 7:00 AM, bharath vissapragada wrote:

> Hey,
>
> Hive does all kinds of parsing , metadata lookups, query tree building and stuff before executing the query. Not sure if this all was included in those 36 seconds !
>
> Also what hive does is, it builds a scan object with ranges based on predicates (and mappers too ) on key column and not a direct "get" call as in hbase shell. This might incur some overhead too!

Since Hive does this in a MapReduce job it definitely incurs overhead.  It does not run directly against HBase as you might wish it did here.

Alan.

>
> On Tue, Sep 11, 2012 at 7:10 PM, Shengjie Min <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I am trying to get hive working on top of my hbase table following the guide below:
> https://cwiki.apache.org/Hive/hbaseintegration.html
>
> CREATE EXTERNAL TABLE hive_hbase_test (key string, a string, b string, c string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES
> ("hbase.columns.mapping"=":key,cf:a,cf:b,cf:c") TBLPROPERTIES ("hbase.table.name"="test");
>
> this hive table creation makes my mapping roughly look like this:
>
> hive_hbase_test  VS   test
> Hive key  -   hbase row_key
> Hive column a -  hbase cf:a
> Hive column b  -  hbase cf:b
> Hive column c  -  hbase cf:c
>
> From my understanding on how HBaseStorageHandler works, it's supposed to take advantage of the hbase row_key index as much as possible. So I would expect,
>
> 1. if you do a hive query against the row key like "select * from hive_hbase_test where key='blabla'", this would utilize the hbase row_key index which give you very quick nearly real-time response just like hbase does.
>
> 2. of coz, if you do a hive query against a column like "select * from hive_hbase_test where a='blabla'", in this case, it queries against a specific column, it probably uses mapred because there is nothing from Hbase side can be utilized.
>
> From my test, query 1 doesn't seem fast at all, still taking ages, so
> select * from hive_hbase_test where key='blabla'   36secs
> vs
> get 'test', 'blabla'      less than 1 sec
> still shows a huge difference.
>
> Anybody has tried this before? Is there anyway I can do sort of query plan analysis against hive query? or I am not mapping hive table against hbase table correctly?
>
> --
> All the best,
> Shengjie Min
>
>
>
>
> --
> Regards,
> Bharath .V
> w:http://researchweb.iiit.ac.in/~bharath.v
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB