|
|
Damien Hardy 2011-11-10, 11:11
Hello there. When I want to get a row by rowid the answer is very slow (even 15 secs some times) What is wrong with my Htable ? Here is some examples to illustrate my problem:
hbase(main):030:0> get 'logs', '_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', { COLUMN => 'body:body', VERSIONS => 1 } COLUMN CELL body:body timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ...
1 row(s) in 6.0310 seconds
hbase(main):031:0> scan 'logs', { STARTROW =>'_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', LIMIT => 1 } ROW COLUMN+CELL _f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKq column=body:body, timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ... rSSqNcToHdA= 1 row(s) in 2.7160 seconds
hbase(main):032:0> get 'logs', '_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==' COLUMN CELL body:body timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ... 1 row(s) in 5.0640 seconds
hbase(main):033:0> describe 'logs' DESCRIPTION ENABLED {NAME => 'logs', FAMILIES => [{NAME => 'body', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => true '1', TTL => '2147483647', BLOCKSIZE => '536870912', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0660 seconds
hbase(main):025:0> get 'logs', '_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', { COLUMN => 'body:body', TIMERANGE => [1320919900000,1320920000000] } COLUMN CELL body:body timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ...
1 row(s) in 0.0630 seconds scan is always fatser than get, I think it's strange.
I get normal answer when I precise the TS.
The table is about 200 regions distributed on 2 nodes (with full stack on each : hdfs / hbase master+regionserver / zookeeper) Region size is 2GB now.
Recently I increase region size from default size (128MB if I remember) to 2Go to get fewer number of regions (I had 3500 regions).
I change hbase.hregion.max.filesize to 2147483648, restart my whole cluster, create a new table, copy via pig from old table to the new one => fewer regions => I'm happy \o/ But on my older table the get answer was very fast, like the one with TS precised on the new table.
Is the size of regions affect so much the Hbase answer fastness ?
get on other table not rebuilt after config change (regions not merged) is still fast.
Thank you,
-- Damien
lars hofhansl 2011-11-10, 18:44
"BLOCKSIZE => '536870912'" You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower.
-- Lars ________________________________
From: Damien Hardy <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Thursday, November 10, 2011 3:11 AM Subject: Row get very slow
Hello there. When I want to get a row by rowid the answer is very slow (even 15 secs some times) What is wrong with my Htable ? Here is some examples to illustrate my problem:
hbase(main):030:0> get 'logs', '_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', { COLUMN => 'body:body', VERSIONS => 1 } COLUMN CELL body:body timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ...
1 row(s) in 6.0310 seconds
hbase(main):031:0> scan 'logs', { STARTROW =>'_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', LIMIT => 1 } ROW COLUMN+CELL _f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKq column=body:body, timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ... rSSqNcToHdA= 1 row(s) in 2.7160 seconds
hbase(main):032:0> get 'logs', '_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==' COLUMN CELL body:body timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ... 1 row(s) in 5.0640 seconds
hbase(main):033:0> describe 'logs' DESCRIPTION ENABLED {NAME => 'logs', FAMILIES => [{NAME => 'body', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => true '1', TTL => '2147483647', BLOCKSIZE => '536870912', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0660 seconds
hbase(main):025:0> get 'logs', '_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', { COLUMN => 'body:body', TIMERANGE => [1320919900000,1320920000000] } COLUMN CELL body:body timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ...
1 row(s) in 0.0630 seconds scan is always fatser than get, I think it's strange.
I get normal answer when I precise the TS.
The table is about 200 regions distributed on 2 nodes (with full stack on each : hdfs / hbase master+regionserver / zookeeper) Region size is 2GB now.
Recently I increase region size from default size (128MB if I remember) to 2Go to get fewer number of regions (I had 3500 regions).
I change hbase.hregion.max.filesize to 2147483648, restart my whole cluster, create a new table, copy via pig from old table to the new one => fewer regions => I'm happy \o/ But on my older table the get answer was very fast, like the one with TS precised on the new table.
Is the size of regions affect so much the Hbase answer fastness ?
get on other table not rebuilt after config change (regions not merged) is still fast.
Thank you,
-- Damien
Arvind Jayaprakash 2011-11-13, 15:13
A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume BLOCKSIZE represents that value.
On Nov 10, lars hofhansl wrote: >"BLOCKSIZE => '536870912'" > > >You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower.
On Sun, Nov 13, 2011 at 7:13 AM, Arvind Jayaprakash <[EMAIL PROTECTED]> wrote: > A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that > MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume > BLOCKSIZE represents that value. >
We should fix that. What would you like to see Arind? St.Ack
Damien Hardy 2011-11-14, 08:51
Le 13/11/2011 16:13, Arvind Jayaprakash a écrit : > A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that > MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume > BLOCKSIZE represents that value. > > On Nov 10, lars hofhansl wrote: >> "BLOCKSIZE => '536870912'" >> >> >> You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower. Hello,
Thank you for answer I have just altered my table and launched a major_compact to get it effective.
I thought that increasing FILSIZE of HBases implies somehow changes on the BLOSKSIZE of my tables and to prevent unbalanced paramaters increased it too ... #FAIL.
The question is : in what application BLOCKSIZE should be changed (increased or decreased) ?
Thank you.
-- Damien
Doug Meil 2011-11-14, 15:32
Hi there- re: "The question is : in what application BLOCKSIZE should be changed (increased or decreased) ?" See.. http://hbase.apache.org/book.html#schema.creationand... http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html On 11/14/11 3:51 AM, "Damien Hardy" <[EMAIL PROTECTED]> wrote: >Le 13/11/2011 16:13, Arvind Jayaprakash a écrit : >> A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that >> MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume >> BLOCKSIZE represents that value. >> >> On Nov 10, lars hofhansl wrote: >>> "BLOCKSIZE => '536870912'" >>> >>> >>> You set your blocksize to 512mb? The default is 64k (65536), try to >>>set it to something lower. > > >Hello, > >Thank you for answer I have just altered my table and launched a >major_compact to get it effective. > >I thought that increasing FILSIZE of HBases implies somehow changes on >the BLOSKSIZE of my tables and to prevent unbalanced paramaters >increased it too ... #FAIL. > >The question is : in what application BLOCKSIZE should be changed >(increased or decreased) ? > >Thank you. > >-- >Damien > > >
Arvind Jayaprakash 2011-11-14, 18:13
On Nov 13, Stack wrote: >On Sun, Nov 13, 2011 at 7:13 AM, Arvind Jayaprakash <[EMAIL PROTECTED]> wrote: >> A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that >> MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume >> BLOCKSIZE represents that value.
>We should fix that. What would you like to see Arind?
Looks like Santa is ahead of schedule this year ...
(1) I've always found it hard to find all configurable "per-table" properties listed in documentation. So that would be a good thing to have.
(2) Also, having all of per table properies being listed on the hbase master page would create more awareness of atleast the terms if now how to twiddle aronud with it. The problem with the specific parameter in question has to do with how the mind runs crazy. A lot of hbase design related documents/discussions mention the term "region size". it is very hard to imagine that MAX_FILESIZE (which is hardly mentioned anywhere) is what really refers to region size and that BLOCKSIZE which appears so prominently on the master page (or output of scanning the .META. tabale for the nerdier folks) is an entiery different beast is easy to miss.
Once we address #1 & #2, it becomes easier to yell "Didn't you RTFM" at anyone who gets confused :-)
lars hofhansl 2011-11-14, 19:24
Did it speed up your queries? As you can see from the followup discussions here, there is some general confusion around this.
Generally there are 2 sizes involved: 1. HBase Filesize 2. HBase Blocksize
#1 sets the maximum size of a region before it is split. Default used to be 512mb, it's now 1g (but usually it should be even larger)
#2 is the size of the blocks inside the HFiles. Smaller blocks mean better random access, but larger block indexes. I would only increase that if you have large cells.
-- Lars ________________________________
From: Damien Hardy <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Monday, November 14, 2011 12:51 AM Subject: Re: Row get very slow
Le 13/11/2011 16:13, Arvind Jayaprakash a écrit : > A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that > MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume > BLOCKSIZE represents that value. > > On Nov 10, lars hofhansl wrote: >> "BLOCKSIZE => '536870912'" >> >> >> You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower. Hello,
Thank you for answer I have just altered my table and launched a major_compact to get it effective.
I thought that increasing FILSIZE of HBases implies somehow changes on the BLOSKSIZE of my tables and to prevent unbalanced paramaters increased it too ... #FAIL.
The question is : in what application BLOCKSIZE should be changed (increased or decreased) ?
Thank you.
-- Damien
Sam Seigal 2011-11-14, 19:37
If you are not too concerned with random access time, but want more efficient scans, is increasing the block size then a good idea ?
On Mon, Nov 14, 2011 at 11:24 AM, lars hofhansl <[EMAIL PROTECTED]> wrote: > Did it speed up your queries? As you can see from the followup discussions here, there is some general confusion around this. > > Generally there are 2 sizes involved: > 1. HBase Filesize > 2. HBase Blocksize > > #1 sets the maximum size of a region before it is split. Default used to be 512mb, it's now 1g (but usually it should be even larger) > > #2 is the size of the blocks inside the HFiles. Smaller blocks mean better random access, but larger block indexes. I would only increase that if you have large cells. > > -- Lars > ________________________________ > > From: Damien Hardy <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Monday, November 14, 2011 12:51 AM > Subject: Re: Row get very slow > > Le 13/11/2011 16:13, Arvind Jayaprakash a écrit : >> A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that >> MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume >> BLOCKSIZE represents that value. >> >> On Nov 10, lars hofhansl wrote: >>> "BLOCKSIZE => '536870912'" >>> >>> >>> You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower. > > > Hello, > > Thank you for answer I have just altered my table and launched a major_compact to get it effective. > > I thought that increasing FILSIZE of HBases implies somehow changes on the BLOSKSIZE of my tables and to prevent unbalanced paramaters increased it too ... #FAIL. > > The question is : in what application BLOCKSIZE should be changed (increased or decreased) ? > > Thank you. > > -- Damien >
On Mon, Nov 14, 2011 at 11:37 AM, Sam Seigal <[EMAIL PROTECTED]> wrote: > If you are not too concerned with random access time, but want more > efficient scans, is increasing the block size then a good idea ? >
I'd say leave things as they are unless you have a problem.
For your case, where random read latency is not so important and you are only scanning, upping the block size should not change your scan latencies and it will make the hfile indices smaller (if you double the blocksize to 128k, your indices should be halved -- you can see index sizes in your regionserver UI).
St.Ack
Damien Hardy 2011-11-15, 08:47
Hi,
It speed it up definitly :)
hbase(main):002:0> get 'logs', '_f:squid_t:20111114110759_b:squid_s:204-taDiFMcQaPzN13dDOZ99PA==' COLUMN CELL body:body timestamp=1321265279234, value=Nov 14 11:00:24 haproxy[15470]: ... [haproxy syslogs] ...
1 row(s) in 0.0170 seconds
Thank you again for help and explanations.
Regards,
-- Damien Le 14/11/2011 20:24, lars hofhansl a écrit : > Did it speed up your queries? As you can see from the followup discussions here, there is some general confusion around this. > > Generally there are 2 sizes involved: > 1. HBase Filesize > 2. HBase Blocksize > > #1 sets the maximum size of a region before it is split. Default used to be 512mb, it's now 1g (but usually it should be even larger) > > #2 is the size of the blocks inside the HFiles. Smaller blocks mean better random access, but larger block indexes. I would only increase that if you have large cells. > > -- Lars > ________________________________ > > From: Damien Hardy<[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Monday, November 14, 2011 12:51 AM > Subject: Re: Row get very slow > > Le 13/11/2011 16:13, Arvind Jayaprakash a écrit : >> A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that >> MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume >> BLOCKSIZE represents that value. >> >> On Nov 10, lars hofhansl wrote: >>> "BLOCKSIZE => '536870912'" >>> >>> >>> You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower. > > Hello, > > Thank you for answer I have just altered my table and launched a major_compact to get it effective. > > I thought that increasing FILSIZE of HBases implies somehow changes on the BLOSKSIZE of my tables and to prevent unbalanced paramaters increased it too ... #FAIL. > > The question is : in what application BLOCKSIZE should be changed (increased or decreased) ? > > Thank you. > > -- Damien
|
|