Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Very poor read performance with composite keys in hbase


+
Rupinder Singh 2013-04-30, 17:48
+
kulkarni.swarnim@...) 2013-04-30, 17:54
+
Rupinder Singh 2013-04-30, 18:03
+
Sanjay Subramanian 2013-04-30, 18:04
+
kulkarni.swarnim@...) 2013-04-30, 18:46
+
Rupinder Singh 2013-04-30, 20:00
+
kulkarni.swarnim@...) 2013-04-30, 20:56
Copy link to this message
-
Re: Very poor read performance with composite keys in hbase
Navis류승우 2013-05-02, 01:39
Currently, hive storage handler reads rows one by one.

https://issues.apache.org/jira/browse/HIVE-3603 is for setting cache
size, which is not yet fixed.

2013/5/1 [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
> That depends on how dynamic your data is. If it is pretty static, you can
> also consider using something like Create Table As Select (CTAS) to create a
> snapshot of your data to HDFS and then run queries on top of that data.
>
> So your query might become something like:
>
> create table my_table as select * from event where key.name=’Signup’ and
> key.dateCreated=’2013-03-06 16:39:55.353’ and
> key.uid=’7af4c330-5988-4255-9250-924ce5864e3bf’;
>
> Since your data is now in HDFS, this should give you a considerable
> performance boost.
>
>
> On Tue, Apr 30, 2013 at 3:00 PM, Rupinder Singh <[EMAIL PROTECTED]> wrote:
>>
>> Swarnim,
>>
>>
>>
>> Thanks. So this means custom map reduce is the viable option when working
>> with hbase tables having composite keys, since it allows to set the start
>> and stop keys. Hive+Hbase combination is out.
>>
>>
>>
>> Regards
>>
>> Rupinder
>>
>>
>>
>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
>> Sent: Wednesday, May 01, 2013 12:17 AM
>>
>>
>> To: [EMAIL PROTECTED]
>> Cc: [EMAIL PROTECTED]
>> Subject: Re: Very poor read performance with composite keys in hbase
>>
>>
>>
>> Rupinder,
>>
>>
>>
>> Hive supports a filter pushdown[1] which means that the predicates in the
>> where clause are pushed down to the storage handler level where either they
>> get handled by the storage handler or delegated to hive if they cannot
>> handle them. As of now, the HBaseStorageHandler only supports primitive
>> types. So when you use strings as keys, behind the scenes they get converted
>> to start and stop keys and restrict the hbase scan. This does not happen for
>> structs. Hence you see a full table scan causing bad performance.
>>
>>
>>
>> [1] https://cwiki.apache.org/Hive/filterpushdowndev.html
>>
>>
>>
>> On Tue, Apr 30, 2013 at 1:04 PM, Sanjay Subramanian
>> <[EMAIL PROTECTED]> wrote:
>>
>> My experience with hive + hbase has been about 8x slower on an average. So
>> I went ahead with hive only option.
>>
>> Sent from my iPhone
>>
>>
>> On Apr 30, 2013, at 11:19 PM, "Rupinder Singh" <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>>
>>
>> I have an hbase cluster where I have a table with a composite key. I map
>> this table to a Hive external table using which I insert/select data
>> into/from this table:
>>
>> CREATE EXTERNAL TABLE event(key
>> struct<name:string,dateCreated:string,uid:string>, {more columns here})
>>
>> ROW FORMAT DELIMITED
>>
>> COLLECTION ITEMS TERMINATED BY '~'
>>
>> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>>
>> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ")
>>
>> TBLPROPERTIES ("hbase.table.name" = "event");
>>
>>
>>
>> The table has about 10 million rows. When I do a select * using all 3
>> components of the key, essentially selecting just 1 row, the response time
>> is almost 700 sec, which seems pretty bad.
>>
>>
>>
>> For comparison purpose, I created another table with a simple string key,
>> and the rest of the columns etc same. The key is a string UUID. Table has
>> same number of column families and same number of rows.
>>
>> CREATE EXTERNAL TABLE test_event(key string, blah blah…..
>>
>> TBLPROPERTIES ("hbase.table.name" = "test_event");
>>
>>
>>
>> When I select a single row from this table by doing select * where
>> key=’something’, the response time is 35 sec.
>>
>>
>>
>> This seems to indicate that in case of composite keys, there is a full
>> table scan happening.  This seems weird.
>>
>>
>>
>> What am I missing here? Is there something special I need to do to get
>> good read performance if I am using composite keys ?
>>
>> Insert performance in both cases is comparable and is as per expectation.
>>
>>
>>
>> Any help is appreciated.
>>
>> Here is the env spec:
>>