|
Abhijit Pol
2010-10-09, 20:15
Stack
2010-10-10, 06:12
Abhijit Pol
2010-10-10, 19:28
Sean Bigdatafun
2010-10-11, 21:28
Abhijit Pol
2010-10-13, 03:06
Stack
2010-10-13, 22:31
Abhijit Pol
2010-10-15, 18:12
Sean Bigdatafun
2010-10-15, 21:14
Sean Bigdatafun
2010-10-15, 21:15
Tatsuya Kawano
2010-10-15, 23:42
Abhijit Pol
2010-10-16, 17:58
Abhijit Pol
2010-10-16, 18:10
Matt Corgan
2010-10-16, 18:27
William Kang
2010-10-16, 19:30
Andrey Stepachev
2010-10-17, 07:59
|
-
HBase cluster with heterogeneous resourcesAbhijit Pol 2010-10-09, 20:15
We are testing with 4 nodes HBase cluster out of which 3 machines are
identical with 64GB RAM and 6x1TB disks. and 4th machine has only 16GB RAM and 2x1TB disks We observe (from server side metrics) frequent latency spikes and RS suicide ~ every 8hrs from our 4th machine. We do have overall heap size configured based on total RAM available but all other configs are same across RSs Is there a way to hint master to distribute regions based on available resources? We are using 0.89.20100924 branch. We have flop at default 0.3 and roughly equal number of regions across all RSs. Thanks, --Abhi
-
Re: HBase cluster with heterogeneous resourcesStack 2010-10-10, 06:12
On Sat, Oct 9, 2010 at 1:15 PM, Abhijit Pol <[EMAIL PROTECTED]> wrote:
> We are testing with 4 nodes HBase cluster out of which 3 machines are > identical with 64GB RAM and 6x1TB disks. and 4th machine has only 16GB RAM > and 2x1TB disks > > We observe (from server side metrics) frequent latency spikes and RS suicide > ~ every 8hrs from our 4th machine. > How much heap have you given your servers? You could up your zk timeout of play with GC tunings -- if full GC the reason RSs are committing hari-kari. > We do have overall heap size configured based on total RAM available but all > other configs are same across RSs > > Is there a way to hint master to distribute regions based on > available resources? > No. Not currently. > We are using 0.89.20100924 branch. We have flop at default 0.3 and roughly > equal number of regions across all RSs. > > I'd suggest taking the odd-man-out out of your cluster or repurposing it as a master node. Usually clusters are homogeneous and much of the software assumes each node equivalent. We've not had a chance to work on clusters made of differently spec'd machines. St.Ack
-
Re: HBase cluster with heterogeneous resourcesAbhijit Pol 2010-10-10, 19:28
Thanks Stack.
I think we have GC under control. We have CMS tunned to start early and don't see slept x longer y in logs anymore. We also have higher zk timeout (150 seconds), guess can bump that up a bit. I was able to point to swap on couple of RSs. Will disable the swap and see how that helps suicides. We observed RSs on machines with swap disabled doing very good so far. Also, as you suggested we will take odd man out. We don't have to have it in. Our master is already low key machine. --Abhi On Sat, Oct 9, 2010 at 11:12 PM, Stack <[EMAIL PROTECTED]> wrote: > On Sat, Oct 9, 2010 at 1:15 PM, Abhijit Pol <[EMAIL PROTECTED]> wrote: > > We are testing with 4 nodes HBase cluster out of which 3 machines are > > identical with 64GB RAM and 6x1TB disks. and 4th machine has only 16GB > RAM > > and 2x1TB disks > > > > We observe (from server side metrics) frequent latency spikes and RS > suicide > > ~ every 8hrs from our 4th machine. > > > > How much heap have you given your servers? You could up your zk > timeout of play with GC tunings -- if full GC the reason RSs are > committing hari-kari. > > > We do have overall heap size configured based on total RAM available but > all > > other configs are same across RSs > > > > Is there a way to hint master to distribute regions based on > > available resources? > > > > No. Not currently. > > > We are using 0.89.20100924 branch. We have flop at default 0.3 and > roughly > > equal number of regions across all RSs. > > > > > > I'd suggest taking the odd-man-out out of your cluster or repurposing > it as a master node. Usually clusters are homogeneous and much of the > software assumes each node equivalent. We've not had a chance to work > on clusters made of differently spec'd machines. > > St.Ack >
-
Re: HBase cluster with heterogeneous resourcesSean Bigdatafun 2010-10-11, 21:28
On Sun, Oct 10, 2010 at 12:28 PM, Abhijit Pol <[EMAIL PROTECTED]> wrote:
> Thanks Stack. > > I think we have GC under control. We have CMS tunned to start early and > don't see slept x longer y in logs anymore. We also have higher zk timeout > (150 seconds), guess can bump that up a bit. > > I was able to point to swap on couple of RSs. Will disable the swap and see > how that helps suicides. We observed RSs on machines with swap disabled > doing very good so far. > Disabling SWAP at the OS level (i.e., resize /swap to zero)? If we control the sum of each JVM's heasize to be under the total physical memory size, does the disabling-or-not make any difference? Thanks. e.g., RS(6GB) + DN (2GB) + TaskTracker (1GB) +... < 16GB seems to be quite possible, does not it? Lots of people mentioned that we should give a lot of physical memory to RegionServer machine. But I'd like to ask a more detailed memory allocation breakdown because in practice, RS runs along with other service like Znode, Datanode, TaskTracker and etc. BTW, for on bully machines, how much heapsize have you allocated to RegionServer? > Also, as you suggested we will take odd man out. We don't have to have it > in. Our master is already low key machine. > > --Abhi > > > On Sat, Oct 9, 2010 at 11:12 PM, Stack <[EMAIL PROTECTED]> wrote: > > > On Sat, Oct 9, 2010 at 1:15 PM, Abhijit Pol <[EMAIL PROTECTED]> wrote: > > > We are testing with 4 nodes HBase cluster out of which 3 machines are > > > identical with 64GB RAM and 6x1TB disks. and 4th machine has only 16GB > > RAM > > > and 2x1TB disks > > > > > > We observe (from server side metrics) frequent latency spikes and RS > > suicide > > > ~ every 8hrs from our 4th machine. > > > > > > > How much heap have you given your servers? You could up your zk > > timeout of play with GC tunings -- if full GC the reason RSs are > > committing hari-kari. > > > > > We do have overall heap size configured based on total RAM available > but > > all > > > other configs are same across RSs > > > > > > Is there a way to hint master to distribute regions based on > > > available resources? > > > > > > > No. Not currently. > > > > > We are using 0.89.20100924 branch. We have flop at default 0.3 and > > roughly > > > equal number of regions across all RSs. > > > > > > > > > > I'd suggest taking the odd-man-out out of your cluster or repurposing > > it as a master node. Usually clusters are homogeneous and much of the > > software assumes each node equivalent. We've not had a chance to work > > on clusters made of differently spec'd machines. > > > > St.Ack > > >
-
Re: HBase cluster with heterogeneous resourcesAbhijit Pol 2010-10-13, 03:06
we did swapoff -a and then updated fstab to permanently turn it off.
we observed swap was actually happening on RSs and after we turned it off we have much stable RSs. i can tell what we have, not sure that is optimal, in fact looking for comments/suggestions from folks who have used it more: 64GB RAM ==> 85% given to HBASE HEAP (30% memstore, 60%block cache) , 512MB DN and 512MB TT we have 64KB HDFS block size and 8K Hbase block size as our load is random read dominated. any suggestions/comments? On Mon, Oct 11, 2010 at 2:28 PM, Sean Bigdatafun <[EMAIL PROTECTED]>wrote: > On Sun, Oct 10, 2010 at 12:28 PM, Abhijit Pol <[EMAIL PROTECTED]> wrote: > > > Thanks Stack. > > > > I think we have GC under control. We have CMS tunned to start early and > > don't see slept x longer y in logs anymore. We also have higher zk > timeout > > (150 seconds), guess can bump that up a bit. > > > > I was able to point to swap on couple of RSs. Will disable the swap and > see > > how that helps suicides. We observed RSs on machines with swap disabled > > doing very good so far. > > > Disabling SWAP at the OS level (i.e., resize /swap to zero)? If we control > the sum of each JVM's heasize to be under the total physical memory size, > does the disabling-or-not make any difference? Thanks. > e.g., RS(6GB) + DN (2GB) + TaskTracker (1GB) +... < 16GB seems to be quite > possible, does not it? > > Lots of people mentioned that we should give a lot of physical memory to > RegionServer machine. But I'd like to ask a more detailed memory allocation > breakdown because in practice, RS runs along with other service like Znode, > Datanode, TaskTracker and etc. > > > BTW, for on bully machines, how much heapsize have you allocated to > RegionServer? > > > > > > Also, as you suggested we will take odd man out. We don't have to have it > > in. Our master is already low key machine. > > > > --Abhi > > > > > > On Sat, Oct 9, 2010 at 11:12 PM, Stack <[EMAIL PROTECTED]> wrote: > > > > > On Sat, Oct 9, 2010 at 1:15 PM, Abhijit Pol <[EMAIL PROTECTED]> > wrote: > > > > We are testing with 4 nodes HBase cluster out of which 3 machines are > > > > identical with 64GB RAM and 6x1TB disks. and 4th machine has only > 16GB > > > RAM > > > > and 2x1TB disks > > > > > > > > We observe (from server side metrics) frequent latency spikes and RS > > > suicide > > > > ~ every 8hrs from our 4th machine. > > > > > > > > > > How much heap have you given your servers? You could up your zk > > > timeout of play with GC tunings -- if full GC the reason RSs are > > > committing hari-kari. > > > > > > > We do have overall heap size configured based on total RAM available > > but > > > all > > > > other configs are same across RSs > > > > > > > > Is there a way to hint master to distribute regions based on > > > > available resources? > > > > > > > > > > No. Not currently. > > > > > > > We are using 0.89.20100924 branch. We have flop at default 0.3 and > > > roughly > > > > equal number of regions across all RSs. > > > > > > > > > > > > > > I'd suggest taking the odd-man-out out of your cluster or repurposing > > > it as a master node. Usually clusters are homogeneous and much of the > > > software assumes each node equivalent. We've not had a chance to work > > > on clusters made of differently spec'd machines. > > > > > > St.Ack > > > > > >
-
Re: HBase cluster with heterogeneous resourcesStack 2010-10-13, 22:31
On Tue, Oct 12, 2010 at 11:06 PM, Abhijit Pol <[EMAIL PROTECTED]> wrote:
> we did swapoff -a and then updated fstab to permanently turn it off. You might not want to turn it off completely. One of the lads was recently talking about the horrors that can happen when no swap. But sounds like you were doing over eager swapping up to this? > we observed swap was actually happening on RSs and after we turned it off we > have much stable RSs. > > i can tell what we have, not sure that is optimal, in fact looking for > comments/suggestions from folks who have used it more: > 64GB RAM ==> 85% given to HBASE HEAP (30% memstore, 60%block cache) , 512MB > DN and 512MB TT > So, I'm bad at math, but thats a heap of 50+GB? Hows that working out for you? You played with GC tuning at all? You might give more to the DN and the TT since you have plenty -- and more to the OS... perhaps less to hbase? How many disks? > we have 64KB HDFS block size Do you mean 64MB? and 8K Hbase block size as our load is random > read dominated. > Small cells? If so, 8k can help some over having the 64k default. You've done the other stuff -- ulimits and xceivers? Hows it running for you? St.Ack
-
Re: HBase cluster with heterogeneous resourcesAbhijit Pol 2010-10-15, 18:12
>
> > we did swapoff -a and then updated fstab to permanently turn it off. > > You might not want to turn it off completely. One of the lads was > recently talking about the horrors that can happen when no swap. > > But sounds like you were doing over eager swapping up to this? > > http://wiki.apache.org/hadoop/PerformanceTuning recommends removing swap and we had swap off on part of the cluster and those machines were doing well in terms of RS crash and other machines were doing lots of swap. So we decided to turn it off for all RS machines. Can you give more inputs on what might be the drawbacks or risks of permanent swap off or what was the observed horror? > > we observed swap was actually happening on RSs and after we turned it off > we > > have much stable RSs. > > > > i can tell what we have, not sure that is optimal, in fact looking for > > comments/suggestions from folks who have used it more: > > 64GB RAM ==> 85% given to HBASE HEAP (30% memstore, 60%block cache) , > 512MB > > DN and 512MB TT > > > > So, I'm bad at math, but thats a heap of 50+GB? Hows that working out > for you? You played with GC tuning at all? You might give more to > the DN and the TT since you have plenty -- and more to the OS... > perhaps less to hbase? > > How many disks? > We played with GC. What worked well so far is starting CMS little early at 40% occupancy; we removed 6m newgen restrictions and observed that we are not growing beyond 18mb and minor GC is coming every seconds instead of every 200ms in steady state (we might cap maxnewgen if things go bad), but so far all pauses are small less than second and No full GC kicked in. We have given more to HBase (and specifically to block cache) because we want 95% read latencies below 20ms and our load is random read heavy with light read-modify-writes. The rational was to go for small hbase blocks (8KB); larger than HBase but smaller than default HDFS block size (64KB); and large block cache to improve hit rate (~37GB) We did very limited experiments with different blocks sizes before going with this configurations. We have 1Gb for DN. We don't run map-reduce much on this cluster so given 512MB to TT. We have separate Hadoop cluster for all our MR and analytics needs. We have 6x1TB disks per machine. > > we have 64KB HDFS block size > > Do you mean 64MB? > > > Its 64KB. Our keys are random enough to have very low chance of exploiting block locality. So for every miss in block cache will read one or more random HDFS blocks anyways and hence it make sense to go for lower HDFS block size. After getting HBASE-3006 in things improved a lot for us. We use large 128MB blocks for our analytic hadoop cluster as it has more seq. reads. Do you think smaller size like 64KB might be actually hearting us? > You've done the other stuff -- ulimits and xceivers? > We have 64k ulimit for all our hadoop cluster machines and xceivers is set to 2048 for hbase cluster > > Hows it running for you? > I will post some real numbers next week when we have it running for 7 days with current config. I won't say we have nailed down everything, but better than what we started with. Any inputs will be really helpful or anything you think we are doing stupid or totally missing it :-)
-
Re: HBase cluster with heterogeneous resourcesSean Bigdatafun 2010-10-15, 21:14
On Fri, Oct 15, 2010 at 11:12 AM, Abhijit Pol <[EMAIL PROTECTED]> wrote:
> > > > > we did swapoff -a and then updated fstab to permanently turn it off. > > > > You might not want to turn it off completely. One of the lads was > > recently talking about the horrors that can happen when no swap. > > > > But sounds like you were doing over eager swapping up to this? > > > > > http://wiki.apache.org/hadoop/PerformanceTuning recommends removing swap > and > we had swap off on part of the cluster and those machines were doing well > in > terms of RS crash and other machines were doing lots of swap. So we decided > to turn it off for all RS machines. > > Can you give more inputs on what might be the drawbacks or risks of > permanent swap off or what was the observed horror? > > > > > > we observed swap was actually happening on RSs and after we turned it > off > > we > > > have much stable RSs. > > > > > > i can tell what we have, not sure that is optimal, in fact looking for > > > comments/suggestions from folks who have used it more: > > > 64GB RAM ==> 85% given to HBASE HEAP (30% memstore, 60%block cache) , > > 512MB > > > DN and 512MB TT > > > > > > > So, I'm bad at math, but thats a heap of 50+GB? Hows that working out > > for you? You played with GC tuning at all? You might give more to > > the DN and the TT since you have plenty -- and more to the OS... > > perhaps less to hbase? > > > > How many disks? > > > > We played with GC. What worked well so far is starting CMS little early at > 40% occupancy; we removed 6m newgen restrictions and observed that we are > not growing beyond 18mb and minor GC is coming every seconds instead of > every 200ms in steady state (we might cap maxnewgen if things go bad), but > so far all pauses are small less than second and No full GC kicked in. > > We have given more to HBase (and specifically to block cache) because we > want 95% read latencies below 20ms and our load is random read heavy with > light read-modify-writes. > The rational was to go for small hbase blocks (8KB); larger than HBase but > smaller than default HDFS block size (64KB); and large block cache to > improve hit rate (~37GB) > We did very limited experiments with different blocks sizes before going > with this configurations. > > We have 1Gb for DN. We don't run map-reduce much on this cluster so given > 512MB to TT. We have separate Hadoop cluster for all our MR > and analytics needs. > > We have 6x1TB disks per machine. > > > > > we have 64KB HDFS block size > > > > Do you mean 64MB? > > > > > > > Its 64KB. Our keys are random enough to have very low chance > of exploiting block locality. So for every miss in block cache will read > one > or more random HDFS blocks anyways and hence it make sense to go for lower > HDFS block size. After getting HBASE-3006 in things improved a lot for us. > If this is your setup, your HDFS' namenode is bound to OOM soon. (Namenode's memory consumption is proportional to the number of blocks on HDFS) I guess you meant "hfile.min.blocksize.size" in ? That is a different parameter from HDFS' block size, IMO. (need someone to confirm) > > We use large 128MB blocks for our analytic hadoop cluster as it has more > seq. reads. Do you think smaller size like 64KB might be actually hearting > us? > > > > > You've done the other stuff -- ulimits and xceivers? > > > > We have 64k ulimit for all our hadoop cluster machines and xceivers is set > to 2048 for hbase cluster > > > > > > Hows it running for you? > > > > I will post some real numbers next week when we have it running for 7 days > with current config. > > I won't say we have nailed down everything, but better than what we started > with. > > Any inputs will be really helpful or anything you think we are doing stupid > or totally missing it :-) >
-
Re: HBase cluster with heterogeneous resourcesSean Bigdatafun 2010-10-15, 21:15
On Wed, Oct 13, 2010 at 3:31 PM, Stack <[EMAIL PROTECTED]> wrote:
> On Tue, Oct 12, 2010 at 11:06 PM, Abhijit Pol <[EMAIL PROTECTED]> wrote: > > we did swapoff -a and then updated fstab to permanently turn it off. > > You might not want to turn it off completely. One of the lads was > recently talking about the horrors that can happen when no swap. > What is the horror? > > But sounds like you were doing over eager swapping up to this? > > > we observed swap was actually happening on RSs and after we turned it off > we > > have much stable RSs. > > > > i can tell what we have, not sure that is optimal, in fact looking for > > comments/suggestions from folks who have used it more: > > 64GB RAM ==> 85% given to HBASE HEAP (30% memstore, 60%block cache) , > 512MB > > DN and 512MB TT > > > > So, I'm bad at math, but thats a heap of 50+GB? Hows that working out > for you? You played with GC tuning at all? You might give more to > the DN and the TT since you have plenty -- and more to the OS... > perhaps less to hbase? > > How many disks? > > > we have 64KB HDFS block size > > Do you mean 64MB? > > > and 8K Hbase block size as our load is random > > read dominated. > > > > Small cells? If so, 8k can help some over having the 64k default. > > You've done the other stuff -- ulimits and xceivers? > > Hows it running for you? > > St.Ack >
-
Re: HBase cluster with heterogeneous resourcesTatsuya Kawano 2010-10-15, 23:42
Hi Abhi, > Can you give more inputs on what might be the drawbacks or risks of > permanent swap off or what was the observed horror? Turning off the swap means you'll meet Linux OOM Killer more often. OOM Killer (Out Of Memory Killer) tends to kill processes that use larger memory space, so RS can be targeted. Even worse, OOM Killer could get stuck because of the low memory situation. It will use up CPU time (%system) and you won't be able to ssh into the machine for a while. Instead of turning off the swap, I would suggest to lower a kernel parameter called "vm.swappiness". It takes a number between 0 to 100; higher value makes the kernel to swap more often so that it can allocate more RAM for the file cache, and lower value makes it less often to swap. So you want a lower value. It's default to 60 on many Linux distributions. Try to make it 0. Thanks, Tatsuya -- Tatsuya Kawano Tokyo, Japan http://twitter.com/tasuys6502 On 10/16/2010, at 3:12 AM, Abhijit Pol wrote: >> >>> we did swapoff -a and then updated fstab to permanently turn it off. >> >> You might not want to turn it off completely. One of the lads was >> recently talking about the horrors that can happen when no swap. >> >> But sounds like you were doing over eager swapping up to this? >> >> > http://wiki.apache.org/hadoop/PerformanceTuning recommends removing swap and > we had swap off on part of the cluster and those machines were doing well in > terms of RS crash and other machines were doing lots of swap. So we decided > to turn it off for all RS machines. > > Can you give more inputs on what might be the drawbacks or risks of > permanent swap off or what was the observed horror? > > > >>> we observed swap was actually happening on RSs and after we turned it off >> we >>> have much stable RSs. >>> >>> i can tell what we have, not sure that is optimal, in fact looking for >>> comments/suggestions from folks who have used it more: >>> 64GB RAM ==> 85% given to HBASE HEAP (30% memstore, 60%block cache) , >> 512MB >>> DN and 512MB TT >>> >> >> So, I'm bad at math, but thats a heap of 50+GB? Hows that working out >> for you? You played with GC tuning at all? You might give more to >> the DN and the TT since you have plenty -- and more to the OS... >> perhaps less to hbase? >> >> How many disks? >> > > We played with GC. What worked well so far is starting CMS little early at > 40% occupancy; we removed 6m newgen restrictions and observed that we are > not growing beyond 18mb and minor GC is coming every seconds instead of > every 200ms in steady state (we might cap maxnewgen if things go bad), but > so far all pauses are small less than second and No full GC kicked in. > > We have given more to HBase (and specifically to block cache) because we > want 95% read latencies below 20ms and our load is random read heavy with > light read-modify-writes. > The rational was to go for small hbase blocks (8KB); larger than HBase but > smaller than default HDFS block size (64KB); and large block cache to > improve hit rate (~37GB) > We did very limited experiments with different blocks sizes before going > with this configurations. > > We have 1Gb for DN. We don't run map-reduce much on this cluster so given > 512MB to TT. We have separate Hadoop cluster for all our MR > and analytics needs. > > We have 6x1TB disks per machine. > > >>> we have 64KB HDFS block size >> >> Do you mean 64MB? >> >> >> > Its 64KB. Our keys are random enough to have very low chance > of exploiting block locality. So for every miss in block cache will read one > or more random HDFS blocks anyways and hence it make sense to go for lower > HDFS block size. After getting HBASE-3006 in things improved a lot for us. > > We use large 128MB blocks for our analytic hadoop cluster as it has more > seq. reads. Do you think smaller size like 64KB might be actually hearting > us? > > > >> You've done the other stuff -- ulimits and xceivers? >> > > We have 64k ulimit for all our hadoop cluster machines and xceivers is set Tatsuya Kawano (Mr.) Tokyo, Japan http://twitter.com/tatsuya6502
-
Re: HBase cluster with heterogeneous resourcesAbhijit Pol 2010-10-16, 17:58
Thanks Tatsuya. Will give "vm.swappiness" a shot.
On Fri, Oct 15, 2010 at 4:42 PM, Tatsuya Kawano <[EMAIL PROTECTED]>wrote: > > Hi Abhi, > > > Can you give more inputs on what might be the drawbacks or risks of > > permanent swap off or what was the observed horror? > > > Turning off the swap means you'll meet Linux OOM Killer more often. OOM > Killer (Out Of Memory Killer) tends to kill processes that use larger > memory space, so RS can be targeted. Even worse, OOM Killer could get stuck > because of the low memory situation. It will use up CPU time (%system) and > you won't be able to ssh into the machine for a while. > > Instead of turning off the swap, I would suggest to lower a kernel > parameter called "vm.swappiness". It takes a number between 0 to 100; higher > value makes the kernel to swap more often so that it can allocate more RAM > for the file cache, and lower value makes it less often to swap. So you want > a lower value. > > It's default to 60 on many Linux distributions. Try to make it 0. > > Thanks, > Tatsuya > > -- > Tatsuya Kawano > Tokyo, Japan > > http://twitter.com/tasuys6502 > > > > > On 10/16/2010, at 3:12 AM, Abhijit Pol wrote: > > >> > >>> we did swapoff -a and then updated fstab to permanently turn it off. > >> > >> You might not want to turn it off completely. One of the lads was > >> recently talking about the horrors that can happen when no swap. > >> > >> But sounds like you were doing over eager swapping up to this? > >> > >> > > http://wiki.apache.org/hadoop/PerformanceTuning recommends removing swap > and > > we had swap off on part of the cluster and those machines were doing well > in > > terms of RS crash and other machines were doing lots of swap. So we > decided > > to turn it off for all RS machines. > > > > Can you give more inputs on what might be the drawbacks or risks of > > permanent swap off or what was the observed horror? > > > > > > > >>> we observed swap was actually happening on RSs and after we turned it > off > >> we > >>> have much stable RSs. > >>> > >>> i can tell what we have, not sure that is optimal, in fact looking for > >>> comments/suggestions from folks who have used it more: > >>> 64GB RAM ==> 85% given to HBASE HEAP (30% memstore, 60%block cache) , > >> 512MB > >>> DN and 512MB TT > >>> > >> > >> So, I'm bad at math, but thats a heap of 50+GB? Hows that working out > >> for you? You played with GC tuning at all? You might give more to > >> the DN and the TT since you have plenty -- and more to the OS... > >> perhaps less to hbase? > >> > >> How many disks? > >> > > > > We played with GC. What worked well so far is starting CMS little early > at > > 40% occupancy; we removed 6m newgen restrictions and observed that we are > > not growing beyond 18mb and minor GC is coming every seconds instead of > > every 200ms in steady state (we might cap maxnewgen if things go bad), > but > > so far all pauses are small less than second and No full GC kicked in. > > > > We have given more to HBase (and specifically to block cache) because we > > want 95% read latencies below 20ms and our load is random read heavy with > > light read-modify-writes. > > The rational was to go for small hbase blocks (8KB); larger than HBase > but > > smaller than default HDFS block size (64KB); and large block cache to > > improve hit rate (~37GB) > > We did very limited experiments with different blocks sizes before going > > with this configurations. > > > > We have 1Gb for DN. We don't run map-reduce much on this cluster so given > > 512MB to TT. We have separate Hadoop cluster for all our MR > > and analytics needs. > > > > We have 6x1TB disks per machine. > > > > > >>> we have 64KB HDFS block size > >> > >> Do you mean 64MB? > >> > >> > >> > > Its 64KB. Our keys are random enough to have very low chance > > of exploiting block locality. So for every miss in block cache will read > one > > or more random HDFS blocks anyways and hence it make sense to go for > lower > > HDFS block size. After getting HBASE-3006 in things improved a lot for
-
Re: HBase cluster with heterogeneous resourcesAbhijit Pol 2010-10-16, 18:10
>
> > If this is your setup, your HDFS' namenode is bound to OOM soon. > (Namenode's > memory consumption is proportional to the number of blocks on HDFS) > > NN runs on master and we have 4GB for NN and that is good for long time given amount of blocks we have. DN has 1GB, TT 512MB and JT 1GB. > I guess you meant "hfile.min.blocksize.size" in ? That is a different > parameter from HDFS' block size, IMO. (need someone to confirm) > > yes, HBase and HDFS blocks are two different params. We are testing with 8k HBASE (default 64KB) and 64k HDFS (default 64MB) blocks sizes. Both these are much smaller than defaults, but we have random read heavy work load and smaller blocks should help, given smaller sizes are not exposing some other bottleneck. HBASE smaller blocks means larger indices and better random read performance. So make sense to trade some RAM for block index as we have plenty RAM on our machines.
-
Re: HBase cluster with heterogeneous resourcesMatt Corgan 2010-10-16, 18:27
I could be wrong, but I don't think there's any performance benefit to
having a small hdfs block size. If you are doing a random read fetching a 1KB cell out of an HFile, it will not pull the entire 64MB hdfs block from hdfs, it plucks only the small section of the hdfs file/block that contains the HFile index and then the appropriate 64KB hbase block. Maybe someone more knowledgeable could elaborate on the exact number and size of hdfs accesses. On Sat, Oct 16, 2010 at 2:10 PM, Abhijit Pol <[EMAIL PROTECTED]> wrote: > > > > > > If this is your setup, your HDFS' namenode is bound to OOM soon. > > (Namenode's > > memory consumption is proportional to the number of blocks on HDFS) > > > > > NN runs on master and we have 4GB for NN and that is good for long time > given amount of blocks we have. DN has 1GB, TT 512MB and JT 1GB. > > > > > I guess you meant "hfile.min.blocksize.size" in ? That is a different > > parameter from HDFS' block size, IMO. (need someone to confirm) > > > > > yes, HBase and HDFS blocks are two different params. We are testing with > 8k HBASE (default 64KB) and 64k HDFS (default 64MB) blocks sizes. Both > these > are much smaller than defaults, but we have random read heavy work load and > smaller blocks should help, given smaller sizes are not exposing some other > bottleneck. > > HBASE smaller blocks means larger indices and better random read > performance. So make sense to trade some RAM for block index as we have > plenty RAM on our machines. >
-
Re: HBase cluster with heterogeneous resourcesWilliam Kang 2010-10-16, 19:30
HDFS blocks are streaming files, which means you cannot random access
those HDFS blocks quickly like other file systems. So that means if your HBase block is in the middle of a HDFS block, you have to traverse inside it to get to the middle. Right? Can somebody explain how HBase manage to fetch the HBase 64k block from the HDFS 64M block fast? On Sat, Oct 16, 2010 at 2:27 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > I could be wrong, but I don't think there's any performance benefit to > having a small hdfs block size. If you are doing a random read fetching a > 1KB cell out of an HFile, it will not pull the entire 64MB hdfs block from > hdfs, it plucks only the small section of the hdfs file/block that contains > the HFile index and then the appropriate 64KB hbase block. Maybe someone > more knowledgeable could elaborate on the exact number and size of hdfs > accesses. > > > On Sat, Oct 16, 2010 at 2:10 PM, Abhijit Pol <[EMAIL PROTECTED]> wrote: > >> > >> > >> > If this is your setup, your HDFS' namenode is bound to OOM soon. >> > (Namenode's >> > memory consumption is proportional to the number of blocks on HDFS) >> > >> > >> NN runs on master and we have 4GB for NN and that is good for long time >> given amount of blocks we have. DN has 1GB, TT 512MB and JT 1GB. >> >> >> >> > I guess you meant "hfile.min.blocksize.size" in ? That is a different >> > parameter from HDFS' block size, IMO. (need someone to confirm) >> > >> > >> yes, HBase and HDFS blocks are two different params. We are testing with >> 8k HBASE (default 64KB) and 64k HDFS (default 64MB) blocks sizes. Both >> these >> are much smaller than defaults, but we have random read heavy work load and >> smaller blocks should help, given smaller sizes are not exposing some other >> bottleneck. >> >> HBASE smaller blocks means larger indices and better random read >> performance. So make sense to trade some RAM for block index as we have >> plenty RAM on our machines. >> >
-
Re: HBase cluster with heterogeneous resourcesAndrey Stepachev 2010-10-17, 07:59
https://issues.apache.org/jira/browse/HDFS-236
How bad is HDFS random access? - Random access in HDFS always seemed to have bad PR though hardly anyone used the interface. Claims/rumours range from "transfers a lot of excess data" (not true) to "we noticed it is 10 times slower than our non-hdfs app" (hard to see how if the app is I/O bound and/or is doing at least semi random reads). - It was good see HBase successfully used the interface for its speed up. It can not achieve competitive performance with out reasonable random access performance in HDFS (for HFile). 2010/10/16 William Kang <[EMAIL PROTECTED]> > HDFS blocks are streaming files, which means you cannot random access > those HDFS blocks quickly like other file systems. So that means if > your HBase block is in the middle of a HDFS block, you have to > traverse inside it to get to the middle. Right? > > Can somebody explain how HBase manage to fetch the HBase 64k block > from the HDFS 64M block fast? > > On Sat, Oct 16, 2010 at 2:27 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > > I could be wrong, but I don't think there's any performance benefit to > > having a small hdfs block size. If you are doing a random read fetching > a > > 1KB cell out of an HFile, it will not pull the entire 64MB hdfs block > from > > hdfs, it plucks only the small section of the hdfs file/block that > contains > > the HFile index and then the appropriate 64KB hbase block. Maybe someone > > more knowledgeable could elaborate on the exact number and size of hdfs > > accesses. > > > > > > On Sat, Oct 16, 2010 at 2:10 PM, Abhijit Pol <[EMAIL PROTECTED]> > wrote: > > > >> > > >> > > >> > If this is your setup, your HDFS' namenode is bound to OOM soon. > >> > (Namenode's > >> > memory consumption is proportional to the number of blocks on HDFS) > >> > > >> > > >> NN runs on master and we have 4GB for NN and that is good for long time > >> given amount of blocks we have. DN has 1GB, TT 512MB and JT 1GB. > >> > >> > >> > >> > I guess you meant "hfile.min.blocksize.size" in ? That is a different > >> > parameter from HDFS' block size, IMO. (need someone to confirm) > >> > > >> > > >> yes, HBase and HDFS blocks are two different params. We are testing with > >> 8k HBASE (default 64KB) and 64k HDFS (default 64MB) blocks sizes. Both > >> these > >> are much smaller than defaults, but we have random read heavy work load > and > >> smaller blocks should help, given smaller sizes are not exposing some > other > >> bottleneck. > >> > >> HBASE smaller blocks means larger indices and better random read > >> performance. So make sense to trade some RAM for block index as we have > >> plenty RAM on our machines. > >> > > > |