Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Very poor read performance with composite keys in hbase


+
Rupinder Singh 2013-04-30, 17:48
+
kulkarni.swarnim@...) 2013-04-30, 17:54
+
Rupinder Singh 2013-04-30, 18:03
+
Sanjay Subramanian 2013-04-30, 18:04
+
kulkarni.swarnim@...) 2013-04-30, 18:46
+
Rupinder Singh 2013-04-30, 20:00
+
kulkarni.swarnim@...) 2013-04-30, 20:56
Copy link to this message
-
Re: Very poor read performance with composite keys in hbase
Currently, hive storage handler reads rows one by one.

https://issues.apache.org/jira/browse/HIVE-3603 is for setting cache
size, which is not yet fixed.

2013/5/1 [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
> That depends on how dynamic your data is. If it is pretty static, you can
> also consider using something like Create Table As Select (CTAS) to create a
> snapshot of your data to HDFS and then run queries on top of that data.
>
> So your query might become something like:
>
> create table my_table as select * from event where key.name=’Signup’ and
> key.dateCreated=’2013-03-06 16:39:55.353’ and
> key.uid=’7af4c330-5988-4255-9250-924ce5864e3bf’;
>
> Since your data is now in HDFS, this should give you a considerable
> performance boost.
>
>
> On Tue, Apr 30, 2013 at 3:00 PM, Rupinder Singh <[EMAIL PROTECTED]> wrote:
>>
>> Swarnim,
>>
>>
>>
>> Thanks. So this means custom map reduce is the viable option when working
>> with hbase tables having composite keys, since it allows to set the start
>> and stop keys. Hive+Hbase combination is out.
>>
>>
>>
>> Regards
>>
>> Rupinder
>>
>>
>>
>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
>> Sent: Wednesday, May 01, 2013 12:17 AM
>>
>>
>> To: [EMAIL PROTECTED]
>> Cc: [EMAIL PROTECTED]
>> Subject: Re: Very poor read performance with composite keys in hbase
>>
>>
>>
>> Rupinder,
>>
>>
>>
>> Hive supports a filter pushdown[1] which means that the predicates in the
>> where clause are pushed down to the storage handler level where either they
>> get handled by the storage handler or delegated to hive if they cannot
>> handle them. As of now, the HBaseStorageHandler only supports primitive
>> types. So when you use strings as keys, behind the scenes they get converted
>> to start and stop keys and restrict the hbase scan. This does not happen for
>> structs. Hence you see a full table scan causing bad performance.
>>
>>
>>
>> [1] https://cwiki.apache.org/Hive/filterpushdowndev.html
>>
>>
>>
>> On Tue, Apr 30, 2013 at 1:04 PM, Sanjay Subramanian
>> <[EMAIL PROTECTED]> wrote:
>>
>> My experience with hive + hbase has been about 8x slower on an average. So
>> I went ahead with hive only option.
>>
>> Sent from my iPhone
>>
>>
>> On Apr 30, 2013, at 11:19 PM, "Rupinder Singh" <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>>
>>
>> I have an hbase cluster where I have a table with a composite key. I map
>> this table to a Hive external table using which I insert/select data
>> into/from this table:
>>
>> CREATE EXTERNAL TABLE event(key
>> struct<name:string,dateCreated:string,uid:string>, {more columns here})
>>
>> ROW FORMAT DELIMITED
>>
>> COLLECTION ITEMS TERMINATED BY '~'
>>
>> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>>
>> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ")
>>
>> TBLPROPERTIES ("hbase.table.name" = "event");
>>
>>
>>
>> The table has about 10 million rows. When I do a select * using all 3
>> components of the key, essentially selecting just 1 row, the response time
>> is almost 700 sec, which seems pretty bad.
>>
>>
>>
>> For comparison purpose, I created another table with a simple string key,
>> and the rest of the columns etc same. The key is a string UUID. Table has
>> same number of column families and same number of rows.
>>
>> CREATE EXTERNAL TABLE test_event(key string, blah blah…..
>>
>> TBLPROPERTIES ("hbase.table.name" = "test_event");
>>
>>
>>
>> When I select a single row from this table by doing select * where
>> key=’something’, the response time is 35 sec.
>>
>>
>>
>> This seems to indicate that in case of composite keys, there is a full
>> table scan happening.  This seems weird.
>>
>>
>>
>> What am I missing here? Is there something special I need to do to get
>> good read performance if I am using composite keys ?
>>
>> Insert performance in both cases is comparable and is as per expectation.
>>
>>
>>
>> Any help is appreciated.
>>
>> Here is the env spec:
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB