Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Miserable Performance of gets


Copy link to this message
-
Re: Miserable Performance of gets
kiran 2013-03-06, 06:42
Describe output of the table on which I am doing batch gets

{NAME => 'XXXXXXX', FAMILIES => [{NAME => 'XXXXX', DATA_BLOCK_ENCODING =>
'NONE', BLOOMFILTER => 'ROW',
true
  TTL => '2147483647', IN_MEMORY => 'false', REPLICATION_SCOPE => '0',
VERSIONS => '1', COMPRESSION =>
'SNAP
 PY', MIN_VERSIONS => '1', COMPRESSION_COMPACT => 'SNAPPY',
KEEP_DELETED_CELLS => 'false', BLOCKSIZE =>
'655
 36', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
On Wed, Mar 6, 2013 at 12:08 PM, kiran <[EMAIL PROTECTED]> wrote:

> Yes I have mistaken for regionsize. The regionsize was set to 20GB instead
> of default 10GB. Our blocksize is default 64KB. Our hdfs block size is
> 128MB. Our memstore flush size is 512MB.
>
>
> On Wed, Mar 6, 2013 at 10:59 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>> Arghh... The default is 64K (kilobytes). :)
>>
>>
>> You might have mixed up the region size with block size. If this is the
>> actual HBase block size this behavior is perfectly explained (scans are
>> fast because fewer blocks are loaded, gets are slow because the entire 20GB
>> - or probably at least an HDFS block of 128MB - has to be brought in).
>>
>>
>> If you can, please attach the output of   describe '<table>'   run in the
>> shell in order to confirm.
>>
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>>  From: kiran <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
>> Sent: Tuesday, March 5, 2013 9:24 PM
>> Subject: Re: Miserable Performance of gets
>>
>> Lars,
>>
>> The hbase block size we set to 20GB....
>>
>> Anoop,
>>
>> We have about 13 regionservers and in the worst case these gets may be
>> distributed across all the regionservers...
>>
>>
>>
>> On Wed, Mar 6, 2013 at 10:43 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>>
>> > Can you tell us more about your setup?
>> > What does   describe '<your-table>'   in the shell display?
>> >
>> > If I had to make a wild guess I'd say you made the HBase block size (not
>> > the HDFS block size) too big.
>> >
>> >
>> > Thanks.
>> >
>> > -- Lars
>> >
>> >
>> >
>> > ________________________________
>> >  From: kiran <[EMAIL PROTECTED]>
>> > To: [EMAIL PROTECTED]
>> > Sent: Tuesday, March 5, 2013 9:06 PM
>> > Subject: Re: Miserable Performance of gets
>> >
>> > Version is 0.94.1
>> >
>> > Yes, the gets are issued against the second table scanning the first
>> table
>> >
>> >
>> > On Wed, Mar 6, 2013 at 10:27 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>> >
>> > > Which HBase version are you using ?
>> > >
>> > > bq. But even for 20 gets
>> > > These were issued against the second table ?
>> > >
>> > > Thanks
>> > >
>> > > On Tue, Mar 5, 2013 at 8:36 PM, kiran <[EMAIL PROTECTED]>
>> > wrote:
>> > >
>> > > > Dear All,
>> > > >
>> > > > I had some miserable experience with gets (batch gets) in hbase. I
>> have
>> > > two
>> > > > tables with different rowkeys, columns are distributed across the
>> two
>> > > > tables.
>> > > >
>> > > > Currently what I am doing is scan over one table and get all the
>> > rowkeys
>> > > in
>> > > > the first table matching my filter. Then issue a batch get on
>> another
>> > > table
>> > > > to retrieve some columns. But even for 20 gets, the performance is
>> like
>> > > > miserable (almost a second or two for 20 gets which is not
>> acceptable).
>> > > > But, scanning even on few thousands of rows is getting completed in
>> > > > milliseconds.
>> > > >
>> > > > My concern is for about 20 gets if it takes second or two,
>> > > > How can it scale ??
>> > > > Will the performance be the same even if I issue 1000 gets ??
>> > > > Is it advisable in hbase to avoid gets ??
>> > > >
>> > > > I can include all columns in only one table and do a scan also, but
>> > > before
>> > > > doing that I need to really understand the issue...
>> > > >
>> > > > Is scanning a better solution for scalability and performance ???
>> > > >
>> > > > Is it advisable not to do joins or normalizations in NOSQL
>> databases,
Thank you
Kiran Sarvabhotla