Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: Very poor read performance with composite keys in hbase


Copy link to this message
-
Re: Very poor read performance with composite keys in hbase
James Taylor 2013-04-30, 21:15
Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? It'll use all of the parts of your row key and depending on how much data you're returning back to the client, will query over 10 million row in seconds.

James
@JamesPlusPlus
http://phoenix-hbase.blogspot.com

On Apr 30, 2013, at 1:59 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:

> That depends on how dynamic your data is. If it is pretty static, you can
> also consider using something like Create Table As Select (CTAS) to create
> a snapshot of your data to HDFS and then run queries on top of that data.
>
> So your query might become something like:
>
> create table my_table as select * from event where key.name=’Signup’ and
> key.dateCreated=’2013-03-06 16:39:55.353’ and key.uid=’7af4c330-5988-4255-
> 9250-924ce5864e3bf’;
>
> Since your data is now in HDFS, this should give you a considerable
> performance boost.
>
>
> On Tue, Apr 30, 2013 at 3:00 PM, Rupinder Singh <[EMAIL PROTECTED]> wrote:
>
>> Swarnim,****
>>
>> ** **
>>
>> Thanks. So this means custom map reduce is the viable option when working
>> with hbase tables having composite keys, since it allows to set the start
>> and stop keys. Hive+Hbase combination is out.****
>>
>> ** **
>>
>> Regards****
>>
>> Rupinder****
>>
>> ** **
>>
>> *From:* [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
>> *Sent:* Wednesday, May 01, 2013 12:17 AM
>>
>> *To:* [EMAIL PROTECTED]
>> *Cc:* [EMAIL PROTECTED]
>> *Subject:* Re: Very poor read performance with composite keys in hbase****
>>
>> ** **
>>
>> Rupinder,****
>>
>> ** **
>>
>> Hive supports a filter pushdown[1] which means that the predicates in the
>> where clause are pushed down to the storage handler level where either they
>> get handled by the storage handler or delegated to hive if they cannot
>> handle them. As of now, the HBaseStorageHandler only supports primitive
>> types. So when you use strings as keys, behind the scenes they get
>> converted to start and stop keys and restrict the hbase scan. This does not
>> happen for structs. Hence you see a full table scan causing bad performance.
>> ****
>>
>> ** **
>>
>> [1] https://cwiki.apache.org/Hive/filterpushdowndev.html****
>>
>> ** **
>>
>> On Tue, Apr 30, 2013 at 1:04 PM, Sanjay Subramanian <
>> [EMAIL PROTECTED]> wrote:****
>>
>> My experience with hive + hbase has been about 8x slower on an average. So
>> I went ahead with hive only option.
>>
>> Sent from my iPhone****
>>
>>
>> On Apr 30, 2013, at 11:19 PM, "Rupinder Singh" <[EMAIL PROTECTED]> wrote:***
>> *
>>
>> Hi,****
>>
>> ****
>>
>> I have an hbase cluster where I have a table with a composite key. I map
>> this table to a Hive external table using which I insert/select data
>> into/from this table:****
>>
>> CREATE EXTERNAL TABLE event(key
>> struct<name:string,dateCreated:string,uid:string>, {more columns here})***
>> *
>>
>> ROW FORMAT DELIMITED****
>>
>> COLLECTION ITEMS TERMINATED BY '~'****
>>
>> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'****
>>
>> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ")***
>> *
>>
>> TBLPROPERTIES ("hbase.table.name" = "event");****
>>
>> ****
>>
>> The table has about 10 million rows. When I do a select * using all 3
>> components of the key, essentially selecting just 1 row, the response time
>> is almost 700 sec, which seems pretty bad.****
>>
>> ****
>>
>> For comparison purpose, I created another table with a simple string key,
>> and the rest of the columns etc same. The key is a string UUID. Table has
>> same number of column families and same number of rows.****
>>
>> CREATE EXTERNAL TABLE test_event(key string, blah blah…..****
>>
>> TBLPROPERTIES ("hbase.table.name" = "test_event");****
>>
>> ****
>>
>> When I select a single row from this table by doing select * where
>> key=’something’, the response time is 35 sec.****
>>
>> ****
>>
>> This seems to indicate that in case of composite keys, there is a full
+
Anoop John 2013-05-02, 04:01