Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Very poor read performance with composite keys in hbase


+
Rupinder Singh 2013-04-30, 17:48
+
kulkarni.swarnim@...) 2013-04-30, 17:54
+
Rupinder Singh 2013-04-30, 18:03
+
Sanjay Subramanian 2013-04-30, 18:04
+
kulkarni.swarnim@...) 2013-04-30, 18:46
+
Rupinder Singh 2013-04-30, 20:00
Copy link to this message
-
Re: Very poor read performance with composite keys in hbase
kulkarni.swarnim@...) 2013-04-30, 20:56
That depends on how dynamic your data is. If it is pretty static, you can
also consider using something like Create Table As Select (CTAS) to create
a snapshot of your data to HDFS and then run queries on top of that data.

So your query might become something like:

create table my_table as select * from event where key.name=’Signup’ and
key.dateCreated=’2013-03-06 16:39:55.353’ and key.uid=’7af4c330-5988-4255-
9250-924ce5864e3bf’;

Since your data is now in HDFS, this should give you a considerable
performance boost.
On Tue, Apr 30, 2013 at 3:00 PM, Rupinder Singh <[EMAIL PROTECTED]> wrote:

>  Swarnim,****
>
> ** **
>
> Thanks. So this means custom map reduce is the viable option when working
> with hbase tables having composite keys, since it allows to set the start
> and stop keys. Hive+Hbase combination is out.****
>
> ** **
>
> Regards****
>
> Rupinder****
>
> ** **
>
> *From:* [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> *Sent:* Wednesday, May 01, 2013 12:17 AM
>
> *To:* [EMAIL PROTECTED]
> *Cc:* [EMAIL PROTECTED]
> *Subject:* Re: Very poor read performance with composite keys in hbase****
>
>  ** **
>
> Rupinder,****
>
> ** **
>
> Hive supports a filter pushdown[1] which means that the predicates in the
> where clause are pushed down to the storage handler level where either they
> get handled by the storage handler or delegated to hive if they cannot
> handle them. As of now, the HBaseStorageHandler only supports primitive
> types. So when you use strings as keys, behind the scenes they get
> converted to start and stop keys and restrict the hbase scan. This does not
> happen for structs. Hence you see a full table scan causing bad performance.
> ****
>
> ** **
>
> [1] https://cwiki.apache.org/Hive/filterpushdowndev.html****
>
> ** **
>
> On Tue, Apr 30, 2013 at 1:04 PM, Sanjay Subramanian <
> [EMAIL PROTECTED]> wrote:****
>
> My experience with hive + hbase has been about 8x slower on an average. So
> I went ahead with hive only option.
>
> Sent from my iPhone****
>
>
> On Apr 30, 2013, at 11:19 PM, "Rupinder Singh" <[EMAIL PROTECTED]> wrote:***
> *
>
>  Hi,****
>
>  ****
>
> I have an hbase cluster where I have a table with a composite key. I map
> this table to a Hive external table using which I insert/select data
> into/from this table:****
>
> CREATE EXTERNAL TABLE event(key
> struct<name:string,dateCreated:string,uid:string>, {more columns here})***
> *
>
> ROW FORMAT DELIMITED****
>
> COLLECTION ITEMS TERMINATED BY '~'****
>
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'****
>
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ")***
> *
>
> TBLPROPERTIES ("hbase.table.name" = "event");****
>
>  ****
>
> The table has about 10 million rows. When I do a select * using all 3
> components of the key, essentially selecting just 1 row, the response time
> is almost 700 sec, which seems pretty bad.****
>
>  ****
>
> For comparison purpose, I created another table with a simple string key,
> and the rest of the columns etc same. The key is a string UUID. Table has
> same number of column families and same number of rows.****
>
> CREATE EXTERNAL TABLE test_event(key string, blah blah…..****
>
> TBLPROPERTIES ("hbase.table.name" = "test_event");****
>
>  ****
>
> When I select a single row from this table by doing select * where
> key=’something’, the response time is 35 sec.****
>
>  ****
>
> This seems to indicate that in case of composite keys, there is a full
> table scan happening.  This seems weird.****
>
>  ****
>
> What am I missing here? Is there something special I need to do to get
> good read performance if I am using composite keys ?****
>
> Insert performance in both cases is comparable and is as per expectation.*
> ***
>
>  ****
>
> Any help is appreciated.****
>
> Here is the env spec:****
>
>  ****
>
> Amazon EMR****
>
> Hbase Cluster- 3 core nodes with 7.5 GB RAM each, 2 CPUs of 2.2 GHz each.
> Master 7.5 GB RAM, 2 CPUs of 2.2 GHz each****

Swarnim
+
Navis류승우 2013-05-02, 01:39