Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: Very poor read performance with composite keys in hbase


+
James Taylor 2013-04-30, 21:15
Copy link to this message
-
Re: Very poor read performance with composite keys in hbase
Anoop John 2013-05-02, 04:01
Navis
        Thanks for the issue link. Currently the read queries will start MR
jobs as usual for reading from HBase. Correct?  Is there any plan for
supporting noMR?

-Anoop-
On Thu, May 2, 2013 at 7:09 AM, Navis류승우 <[EMAIL PROTECTED]> wrote:

> Currently, hive storage handler reads rows one by one.
>
> https://issues.apache.org/jira/browse/HIVE-3603 is for setting cache
> size, which is not yet fixed.
>
> 2013/5/1 [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
> > That depends on how dynamic your data is. If it is pretty static, you can
> > also consider using something like Create Table As Select (CTAS) to
> create a
> > snapshot of your data to HDFS and then run queries on top of that data.
> >
> > So your query might become something like:
> >
> > create table my_table as select * from event where key.name=’Signup’ and
> > key.dateCreated=’2013-03-06 16:39:55.353’ and
> > key.uid=’7af4c330-5988-4255-9250-924ce5864e3bf’;
> >
> > Since your data is now in HDFS, this should give you a considerable
> > performance boost.
> >
> >
> > On Tue, Apr 30, 2013 at 3:00 PM, Rupinder Singh <[EMAIL PROTECTED]> wrote:
> >>
> >> Swarnim,
> >>
> >>
> >>
> >> Thanks. So this means custom map reduce is the viable option when
> working
> >> with hbase tables having composite keys, since it allows to set the
> start
> >> and stop keys. Hive+Hbase combination is out.
> >>
> >>
> >>
> >> Regards
> >>
> >> Rupinder
> >>
> >>
> >>
> >> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> >> Sent: Wednesday, May 01, 2013 12:17 AM
> >>
> >>
> >> To: [EMAIL PROTECTED]
> >> Cc: [EMAIL PROTECTED]
> >> Subject: Re: Very poor read performance with composite keys in hbase
> >>
> >>
> >>
> >> Rupinder,
> >>
> >>
> >>
> >> Hive supports a filter pushdown[1] which means that the predicates in
> the
> >> where clause are pushed down to the storage handler level where either
> they
> >> get handled by the storage handler or delegated to hive if they cannot
> >> handle them. As of now, the HBaseStorageHandler only supports primitive
> >> types. So when you use strings as keys, behind the scenes they get
> converted
> >> to start and stop keys and restrict the hbase scan. This does not
> happen for
> >> structs. Hence you see a full table scan causing bad performance.
> >>
> >>
> >>
> >> [1] https://cwiki.apache.org/Hive/filterpushdowndev.html
> >>
> >>
> >>
> >> On Tue, Apr 30, 2013 at 1:04 PM, Sanjay Subramanian
> >> <[EMAIL PROTECTED]> wrote:
> >>
> >> My experience with hive + hbase has been about 8x slower on an average.
> So
> >> I went ahead with hive only option.
> >>
> >> Sent from my iPhone
> >>
> >>
> >> On Apr 30, 2013, at 11:19 PM, "Rupinder Singh" <[EMAIL PROTECTED]> wrote:
> >>
> >> Hi,
> >>
> >>
> >>
> >> I have an hbase cluster where I have a table with a composite key. I map
> >> this table to a Hive external table using which I insert/select data
> >> into/from this table:
> >>
> >> CREATE EXTERNAL TABLE event(key
> >> struct<name:string,dateCreated:string,uid:string>, {more columns here})
> >>
> >> ROW FORMAT DELIMITED
> >>
> >> COLLECTION ITEMS TERMINATED BY '~'
> >>
> >> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> >>
> >> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ")
> >>
> >> TBLPROPERTIES ("hbase.table.name" = "event");
> >>
> >>
> >>
> >> The table has about 10 million rows. When I do a select * using all 3
> >> components of the key, essentially selecting just 1 row, the response
> time
> >> is almost 700 sec, which seems pretty bad.
> >>
> >>
> >>
> >> For comparison purpose, I created another table with a simple string
> key,
> >> and the rest of the columns etc same. The key is a string UUID. Table
> has
> >> same number of column families and same number of rows.
> >>
> >> CREATE EXTERNAL TABLE test_event(key string, blah blah…..
> >>
> >> TBLPROPERTIES ("hbase.table.name" = "test_event");
> >>
> >>
> >>
> >> When I select a single row from this table by doing select * where