Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Miserable Performance of gets


+
kiran 2013-03-06, 04:36
+
Ted Yu 2013-03-06, 04:57
+
kiran 2013-03-06, 05:06
+
lars hofhansl 2013-03-06, 05:13
+
kiran 2013-03-06, 05:24
+
lars hofhansl 2013-03-06, 05:29
+
kiran 2013-03-06, 06:38
Copy link to this message
-
Re: Miserable Performance of gets
Describe output of the table on which I am doing batch gets

{NAME => 'XXXXXXX', FAMILIES => [{NAME => 'XXXXX', DATA_BLOCK_ENCODING =>
'NONE', BLOOMFILTER => 'ROW',
true
  TTL => '2147483647', IN_MEMORY => 'false', REPLICATION_SCOPE => '0',
VERSIONS => '1', COMPRESSION =>
'SNAP
 PY', MIN_VERSIONS => '1', COMPRESSION_COMPACT => 'SNAPPY',
KEEP_DELETED_CELLS => 'false', BLOCKSIZE =>
'655
 36', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
On Wed, Mar 6, 2013 at 12:08 PM, kiran <[EMAIL PROTECTED]> wrote:

> Yes I have mistaken for regionsize. The regionsize was set to 20GB instead
> of default 10GB. Our blocksize is default 64KB. Our hdfs block size is
> 128MB. Our memstore flush size is 512MB.
>
>
> On Wed, Mar 6, 2013 at 10:59 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>> Arghh... The default is 64K (kilobytes). :)
>>
>>
>> You might have mixed up the region size with block size. If this is the
>> actual HBase block size this behavior is perfectly explained (scans are
>> fast because fewer blocks are loaded, gets are slow because the entire 20GB
>> - or probably at least an HDFS block of 128MB - has to be brought in).
>>
>>
>> If you can, please attach the output of   describe '<table>'   run in the
>> shell in order to confirm.
>>
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>>  From: kiran <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
>> Sent: Tuesday, March 5, 2013 9:24 PM
>> Subject: Re: Miserable Performance of gets
>>
>> Lars,
>>
>> The hbase block size we set to 20GB....
>>
>> Anoop,
>>
>> We have about 13 regionservers and in the worst case these gets may be
>> distributed across all the regionservers...
>>
>>
>>
>> On Wed, Mar 6, 2013 at 10:43 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>>
>> > Can you tell us more about your setup?
>> > What does   describe '<your-table>'   in the shell display?
>> >
>> > If I had to make a wild guess I'd say you made the HBase block size (not
>> > the HDFS block size) too big.
>> >
>> >
>> > Thanks.
>> >
>> > -- Lars
>> >
>> >
>> >
>> > ________________________________
>> >  From: kiran <[EMAIL PROTECTED]>
>> > To: [EMAIL PROTECTED]
>> > Sent: Tuesday, March 5, 2013 9:06 PM
>> > Subject: Re: Miserable Performance of gets
>> >
>> > Version is 0.94.1
>> >
>> > Yes, the gets are issued against the second table scanning the first
>> table
>> >
>> >
>> > On Wed, Mar 6, 2013 at 10:27 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>> >
>> > > Which HBase version are you using ?
>> > >
>> > > bq. But even for 20 gets
>> > > These were issued against the second table ?
>> > >
>> > > Thanks
>> > >
>> > > On Tue, Mar 5, 2013 at 8:36 PM, kiran <[EMAIL PROTECTED]>
>> > wrote:
>> > >
>> > > > Dear All,
>> > > >
>> > > > I had some miserable experience with gets (batch gets) in hbase. I
>> have
>> > > two
>> > > > tables with different rowkeys, columns are distributed across the
>> two
>> > > > tables.
>> > > >
>> > > > Currently what I am doing is scan over one table and get all the
>> > rowkeys
>> > > in
>> > > > the first table matching my filter. Then issue a batch get on
>> another
>> > > table
>> > > > to retrieve some columns. But even for 20 gets, the performance is
>> like
>> > > > miserable (almost a second or two for 20 gets which is not
>> acceptable).
>> > > > But, scanning even on few thousands of rows is getting completed in
>> > > > milliseconds.
>> > > >
>> > > > My concern is for about 20 gets if it takes second or two,
>> > > > How can it scale ??
>> > > > Will the performance be the same even if I issue 1000 gets ??
>> > > > Is it advisable in hbase to avoid gets ??
>> > > >
>> > > > I can include all columns in only one table and do a scan also, but
>> > > before
>> > > > doing that I need to really understand the issue...
>> > > >
>> > > > Is scanning a better solution for scalability and performance ???
>> > > >
>> > > > Is it advisable not to do joins or normalizations in NOSQL
>> databases,
Thank you
Kiran Sarvabhotla

+
lars hofhansl 2013-03-06, 07:05
+
kiran 2013-03-06, 05:26
+
Anoop Sam John 2013-03-06, 05:12
+
Stack 2013-03-06, 06:51
+
kiran 2013-03-06, 07:00
+
Stack 2013-03-06, 07:02
+
kiran 2013-03-06, 07:01
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB