Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Miserable Performance of gets


+
kiran 2013-03-06, 04:36
+
Ted Yu 2013-03-06, 04:57
+
kiran 2013-03-06, 05:06
+
lars hofhansl 2013-03-06, 05:13
+
kiran 2013-03-06, 05:24
+
lars hofhansl 2013-03-06, 05:29
+
kiran 2013-03-06, 06:38
+
kiran 2013-03-06, 06:42
Copy link to this message
-
Re: Miserable Performance of gets
lars hofhansl 2013-03-06, 07:05
Hmm... So that theory is out. Anything strange in the logs?
You have 13 region server and 13 data nodes colocated on the same machines, I assume.

The Gets are actually sent to the involved region server in parallel, so anything more than a few milliseconds is suspect.
How big are the rows/columns that you are retrieving?
Does this change in any way if you major_compact your table?
-- Lars

________________________________
 From: kiran <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
Sent: Tuesday, March 5, 2013 10:42 PM
Subject: Re: Miserable Performance of gets
 

Describe output of the table on which I am doing batch gets

{NAME => 'XXXXXXX', FAMILIES => [{NAME => 'XXXXX', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', true                                                      
  TTL => '2147483647', IN_MEMORY => 'false', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAP                                                           
 PY', MIN_VERSIONS => '1', COMPRESSION_COMPACT => 'SNAPPY', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '655                                                           
 36', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
On Wed, Mar 6, 2013 at 12:08 PM, kiran <[EMAIL PROTECTED]> wrote:

Yes I have mistaken for regionsize. The regionsize was set to 20GB instead of default 10GB. Our blocksize is default 64KB. Our hdfs block size is 128MB. Our memstore flush size is 512MB.
>
>
>
>
>On Wed, Mar 6, 2013 at 10:59 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>Arghh... The default is 64K (kilobytes). :)
>>
>>
>>You might have mixed up the region size with block size. If this is the actual HBase block size this behavior is perfectly explained (scans are fast because fewer blocks are loaded, gets are slow because the entire 20GB - or probably at least an HDFS block of 128MB - has to be brought in).
>>
>>
>>If you can, please attach the output of   describe '<table>'   run in the shell in order to confirm.
>>
>>
>>
>>-- Lars
>>
>>
>>
>>________________________________
>> From: kiran <[EMAIL PROTECTED]>
>>To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
>>Sent: Tuesday, March 5, 2013 9:24 PM
>>
>>Subject: Re: Miserable Performance of gets
>>
>>Lars,
>>
>>The hbase block size we set to 20GB....
>>
>>Anoop,
>>
>>We have about 13 regionservers and in the worst case these gets may be
>>distributed across all the regionservers...
>>
>>
>>
>>On Wed, Mar 6, 2013 at 10:43 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>>
>>> Can you tell us more about your setup?
>>> What does   describe '<your-table>'   in the shell display?
>>>
>>> If I had to make a wild guess I'd say you made the HBase block size (not
>>> the HDFS block size) too big.
>>>
>>>
>>> Thanks.
>>>
>>> -- Lars
>>>
>>>
>>>
>>> ________________________________
>>>  From: kiran <[EMAIL PROTECTED]>
>>> To: [EMAIL PROTECTED]
>>> Sent: Tuesday, March 5, 2013 9:06 PM
>>> Subject: Re: Miserable Performance of gets
>>>
>>> Version is 0.94.1
>>>
>>> Yes, the gets are issued against the second table scanning the first table
>>>
>>>
>>> On Wed, Mar 6, 2013 at 10:27 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>>>
>>> > Which HBase version are you using ?
>>> >
>>> > bq. But even for 20 gets
>>> > These were issued against the second table ?
>>> >
>>> > Thanks
>>> >
>>> > On Tue, Mar 5, 2013 at 8:36 PM, kiran <[EMAIL PROTECTED]>
>>> wrote:
>>> >
>>> > > Dear All,
>>> > >
>>> > > I had some miserable experience with gets (batch gets) in hbase. I have
>>> > two
>>> > > tables with different rowkeys, columns are distributed across the two
>>> > > tables.
>>> > >
>>> > > Currently what I am doing is scan over one table and get all the
>>> rowkeys
>>> > in
>>> > > the first table matching my filter. Then issue a batch get on another
>>> > table
>>> > > to retrieve some columns. But even for 20 gets, the performance is like
>>> > > miserable (almost a second or two for 20 gets which is not acceptable).

Thank you
Kiran Sarvabhotla

+
kiran 2013-03-06, 05:26
+
Anoop Sam John 2013-03-06, 05:12
+
Stack 2013-03-06, 06:51
+
kiran 2013-03-06, 07:00
+
Stack 2013-03-06, 07:02
+
kiran 2013-03-06, 07:01