|
Ishan Chhabra
2012-12-03, 21:39
Marcos Ortiz
2012-12-03, 22:03
Doug Meil
2012-12-03, 22:13
Robert Dyer
2012-12-07, 04:02
Mohammad Tariq
2012-12-07, 06:57
谢良
2012-12-07, 09:58
Adrien Mogenet
2012-12-10, 22:21
Azury
2012-12-11, 06:47
谢良
2012-12-11, 08:21
谢良
2012-12-11, 08:42
Ishan Chhabra
2012-12-11, 12:16
|
-
Multiple regionservers on a single nodeIshan Chhabra 2012-12-03, 21:39
Hi,
Has anybody tried to run multiple RegionServers on a single physical node? Are there deep technical issues or minor impediments that would hinder this? We are trying to do this because we are facing a lot of GC pauses on the large heap sizes (~70G) that we are using, which leads to a lot of timeouts in our latency critical application. More processes with smaller heaps would help in mitigating this issue. Any experience or thoughts on this would help. Thanks! -- *Ishan Chhabra *| Rocket Scientist | Rocketfuel Inc. | *m *650 556 6803
-
Re: Multiple regionservers on a single nodeMarcos Ortiz 2012-12-03, 22:03
Regards, Ishan.
On 12/03/2012 04:39 PM, Ishan Chhabra wrote: > Hi, > Has anybody tried to run multiple RegionServers on a single physical > node? Are there deep technical issues or minor impediments that would > hinder this? Can you provide more information about your setup? - Network - Disk schema - RAM > > We are trying to do this because we are facing a lot of GC pauses on the > large heap sizes (~70G) that we are using, which leads to a lot of timeouts > in our latency critical application. More processes with smaller heaps > would help in mitigating this issue. Have you read this, Ishan? http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/ This great post described the Long GC pauses problem. An another great post in the Sematex's blog, describe how Memstore works, which is a critical resource for HBase operations: http://blog.sematext.com/2012/07/16/hbase-memstore-what-you-should-know/ What is the value of: - dfs.balance.bandwidthPerSec - dfs.blocksize - hbase.hregion.memstore.flush.size - dfs.datanode.max.xcievers > > Any experience or thoughts on this would help. > Thanks! > -- Marcos Luis Ort�z Valmaseda about.me/marcosortiz <http://about.me/marcosortiz> @marcosluis2186 <http://twitter.com/marcosluis2186> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
-
Re: Multiple regionservers on a single nodeDoug Meil 2012-12-03, 22:13
Hi there, Not tried multi-RS on a single node, but have you looked at the off-heap cache? It's a part of 0.92.x. From what I understand that feature was designed with this case in mind (I.e., trying to do a lot of caching, but don't want to introduce GC issues in RS). https://issues.apache.org/jira/browse/HBASE-4027 On 12/3/12 4:39 PM, "Ishan Chhabra" <[EMAIL PROTECTED]> wrote: >Hi, >Has anybody tried to run multiple RegionServers on a single physical >node? Are there deep technical issues or minor impediments that would >hinder this? > >We are trying to do this because we are facing a lot of GC pauses on the >large heap sizes (~70G) that we are using, which leads to a lot of >timeouts >in our latency critical application. More processes with smaller heaps >would help in mitigating this issue. > >Any experience or thoughts on this would help. >Thanks! > >-- >*Ishan Chhabra *| Rocket Scientist | Rocketfuel Inc. | *m *650 556 6803
-
Re: Multiple regionservers on a single nodeRobert Dyer 2012-12-07, 04:02
I too am interested in running multiple RS on a single node.
I have a very small cluster where all nodes are identical. However, I was just given a very powerful node to add into this cluster which effectively doubles the total CPUs, RAM, and HDDs in the cluster. As such, when I run a MR job half the jobs go to this single, new node yet most of the data is not local due to HBase balancing the regions. Does it make sense for me to run multi-RS on this node? On Mon, Dec 3, 2012 at 3:39 PM, Ishan Chhabra <[EMAIL PROTECTED]>wrote: > Hi, > Has anybody tried to run multiple RegionServers on a single physical > node? Are there deep technical issues or minor impediments that would > hinder this? > > We are trying to do this because we are facing a lot of GC pauses on the > large heap sizes (~70G) that we are using, which leads to a lot of timeouts > in our latency critical application. More processes with smaller heaps > would help in mitigating this issue. > > Any experience or thoughts on this would help. > Thanks! > > -- > *Ishan Chhabra *| Rocket Scientist | Rocketfuel Inc. | *m *650 556 6803 > -- Robert Dyer [EMAIL PROTECTED]
-
Re: Multiple regionservers on a single nodeMohammad Tariq 2012-12-07, 06:57
Hello all,
I would like to add 2 cents from my side. Even if we have superior CPU, RAM and Disk, the IO still remains the bottleneck. The CPU would never have much impact on the overall performance, no matter how powerful it is, if there is no considerable evolution of IO. Also, if we have multiple RSs, our DN and TT may face memory issues. What do you guys say? Thank you. Regards, Mohammad Tariq On Fri, Dec 7, 2012 at 9:32 AM, Robert Dyer <[EMAIL PROTECTED]> wrote: > I too am interested in running multiple RS on a single node. > > I have a very small cluster where all nodes are identical. However, I was > just given a very powerful node to add into this cluster which effectively > doubles the total CPUs, RAM, and HDDs in the cluster. > > As such, when I run a MR job half the jobs go to this single, new node yet > most of the data is not local due to HBase balancing the regions. > > Does it make sense for me to run multi-RS on this node? > > > On Mon, Dec 3, 2012 at 3:39 PM, Ishan Chhabra <[EMAIL PROTECTED] > >wrote: > > > Hi, > > Has anybody tried to run multiple RegionServers on a single physical > > node? Are there deep technical issues or minor impediments that would > > hinder this? > > > > We are trying to do this because we are facing a lot of GC pauses on the > > large heap sizes (~70G) that we are using, which leads to a lot of > timeouts > > in our latency critical application. More processes with smaller heaps > > would help in mitigating this issue. > > > > Any experience or thoughts on this would help. > > Thanks! > > > > -- > > *Ishan Chhabra *| Rocket Scientist | Rocketfuel Inc. | *m *650 556 6803 > > > > > > -- > > Robert Dyer > [EMAIL PROTECTED] >
-
答复: Multiple regionservers on a single node谢良 2012-12-07, 09:58
Emm, have you tried to tune your GC deeply? please provide the exactly VM options and jdk version and GC logs..
In our test cluster this week, i managed to reduce the longest STW from 22+ seconds(Xmx20G) to 1.1s(Xmx48G) under a very heavy YCSB stress long-term-testing. Also it would be better to ask help from hotspot-gc-use/hotspot-gc-dev mail list:) And the G1GC within jdk7u4+ is a potential solution for large-heap senario as well:) ________________________________________ > On Mon, Dec 3, 2012 at 3:39 PM, Ishan Chhabra <[EMAIL PROTECTED] > >wrote: > > > Hi, > > Has anybody tried to run multiple RegionServers on a single physical > > node? Are there deep technical issues or minor impediments that would > > hinder this? > > > > We are trying to do this because we are facing a lot of GC pauses on the > > large heap sizes (~70G) that we are using, which leads to a lot of > timeouts > > in our latency critical application. More processes with smaller heaps > > would help in mitigating this issue. > > > > Any experience or thoughts on this would help. > > Thanks! > > > > -- > > *Ishan Chhabra *| Rocket Scientist | Rocketfuel Inc. | *m *650 556 6803 > > > > > > -- > > Robert Dyer > [EMAIL PROTECTED] >
-
Re: 答复: Multiple regionservers on a single nodeAdrien Mogenet 2012-12-10, 22:21
On Fri, Dec 7, 2012 at 10:58 AM, 谢良 <[EMAIL PROTECTED]> wrote:
> Emm, have you tried to tune your GC deeply? please provide the exactly VM > options and jdk version and GC logs.. > In our test cluster this week, i managed to reduce the longest STW from > 22+ seconds(Xmx20G) to 1.1s(Xmx48G) under a very heavy YCSB stress > long-term-testing. > Do you have any further explanation on your specific case ? Looks interesting :-) > > Also it would be better to ask help from hotspot-gc-use/hotspot-gc-dev > mail list:) > And the G1GC within jdk7u4+ is a potential solution for large-heap senario > as well:) > ________________________________________ > > On Mon, Dec 3, 2012 at 3:39 PM, Ishan Chhabra <[EMAIL PROTECTED] > > >wrote: > > > > > Hi, > > > Has anybody tried to run multiple RegionServers on a single physical > > > node? Are there deep technical issues or minor impediments that would > > > hinder this? > > > > > > We are trying to do this because we are facing a lot of GC pauses on > the > > > large heap sizes (~70G) that we are using, which leads to a lot of > > timeouts > > > in our latency critical application. More processes with smaller heaps > > > would help in mitigating this issue. > > > > > > Any experience or thoughts on this would help. > > > Thanks! > > > > > > -- > > > *Ishan Chhabra *| Rocket Scientist | Rocketfuel Inc. | *m *650 556 > 6803 > > > > > > > > > > > -- > > > > Robert Dyer > > [EMAIL PROTECTED] > > > -- Adrien Mogenet 06.59.16.64.22 http://www.mogenet.me
-
Re:Re: 答复: Multiple regionservers on a single nodeAzury 2012-12-11, 06:47
Can you share your GC command options here?
在 2012-12-11 06:21:08,"Adrien Mogenet" <[EMAIL PROTECTED]> 写道: >On Fri, Dec 7, 2012 at 10:58 AM, 谢良 <[EMAIL PROTECTED]> wrote: > >> Emm, have you tried to tune your GC deeply? please provide the exactly VM >> options and jdk version and GC logs.. >> In our test cluster this week, i managed to reduce the longest STW from >> 22+ seconds(Xmx20G) to 1.1s(Xmx48G) under a very heavy YCSB stress >> long-term-testing. >> > >Do you have any further explanation on your specific case ? Looks >interesting :-) > > >> >> Also it would be better to ask help from hotspot-gc-use/hotspot-gc-dev >> mail list:) >> And the G1GC within jdk7u4+ is a potential solution for large-heap senario >> as well:) >> ________________________________________ >> > On Mon, Dec 3, 2012 at 3:39 PM, Ishan Chhabra <[EMAIL PROTECTED] >> > >wrote: >> > >> > > Hi, >> > > Has anybody tried to run multiple RegionServers on a single physical >> > > node? Are there deep technical issues or minor impediments that would >> > > hinder this? >> > > >> > > We are trying to do this because we are facing a lot of GC pauses on >> the >> > > large heap sizes (~70G) that we are using, which leads to a lot of >> > timeouts >> > > in our latency critical application. More processes with smaller heaps >> > > would help in mitigating this issue. >> > > >> > > Any experience or thoughts on this would help. >> > > Thanks! >> > > >> > > -- >> > > *Ishan Chhabra *| Rocket Scientist | Rocketfuel Inc. | *m *650 556 >> 6803 >> > > >> > >> > >> > >> > -- >> > >> > Robert Dyer >> > [EMAIL PROTECTED] >> > >> > > > >-- >Adrien Mogenet >06.59.16.64.22 >http://www.mogenet.me
-
答复: 答复: Multiple regionservers on a single node谢良 2012-12-11, 08:21
I am just a hbase&hotspot vm newbie:)
1)Before look into GC detail, we should turn ontracing flags, e.g. -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:xxxx -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintClassHistogramAfterFullGC -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintPromotionFailure ... 2)dive into GC log during each run, figure out the longest STW root cause, statistic GC total time/GC total count, etc. Here're some usual safepoint cause: GC 、Revoke Biasedlock 、Deoptimize、FindDeadlocks、PrintJNI, etc.. 3)If ParNew costs too much, we can reduce Xmn, adjust survivorRatio/TargetSurvivorRatio/PretenureSizeThreshold... 4)If CMS initial mak&remark are expesive, please notice : UseCMSCompactAtFullCollection/CMSInitiatingOccupancyFraction + UseCMSInitiatingOccupancyOnly/CMSParallelRemarkEnabled/CMSClassUnloadingEnabled/CMSMaxAbortablePrecleanTime/CMSWaitDuration/CMSScavengeBeforeRemark/ 5)Multi-thread concurrent is a key as well, if running on modern hareware, e.g: CMSConcurrentMTEnabled/ParallelGCThreads/ConcGCThreads/... at last, RTFC of right hotspot vm or ask help from hotspot-gc mail list should be the best choice for GC issue Help it helpful for you, Liang ________________________________________ 发件人: Adrien Mogenet [[EMAIL PROTECTED]] 发送时间: 2012年12月11日 6:21 收件人: [EMAIL PROTECTED] 主题: Re: 答复: Multiple regionservers on a single node On Fri, Dec 7, 2012 at 10:58 AM, 谢良 <[EMAIL PROTECTED]> wrote: > Emm, have you tried to tune your GC deeply? please provide the exactly VM > options and jdk version and GC logs.. > In our test cluster this week, i managed to reduce the longest STW from > 22+ seconds(Xmx20G) to 1.1s(Xmx48G) under a very heavy YCSB stress > long-term-testing. > Do you have any further explanation on your specific case ? Looks interesting :-) > > Also it would be better to ask help from hotspot-gc-use/hotspot-gc-dev > mail list:) > And the G1GC within jdk7u4+ is a potential solution for large-heap senario > as well:) > ________________________________________ > > On Mon, Dec 3, 2012 at 3:39 PM, Ishan Chhabra <[EMAIL PROTECTED] > > >wrote: > > > > > Hi, > > > Has anybody tried to run multiple RegionServers on a single physical > > > node? Are there deep technical issues or minor impediments that would > > > hinder this? > > > > > > We are trying to do this because we are facing a lot of GC pauses on > the > > > large heap sizes (~70G) that we are using, which leads to a lot of > > timeouts > > > in our latency critical application. More processes with smaller heaps > > > would help in mitigating this issue. > > > > > > Any experience or thoughts on this would help. > > > Thanks! > > > > > > -- > > > *Ishan Chhabra *| Rocket Scientist | Rocketfuel Inc. | *m *650 556 > 6803 > > > > > > > > > > > -- > > > > Robert Dyer > > [EMAIL PROTECTED] > > > -- Adrien Mogenet 06.59.16.64.22 http://www.mogenet.me
-
答复: Re:Re: 答复: Multiple regionservers on a single node谢良 2012-12-11, 08:42
Sure, here it is :
-Xmx49152m -Xms49152m -Xmn1024m -Xss256k -XX:MaxDirectMemorySize=1024m -XX:MaxPermSize=512m -XX:PermSize=512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/work/log/hbase/ggsrv-miliao/regionserver -XX:+PrintGCApplicationStoppedTime -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/home/work/log/hbase/ggsrv-miliao/regionserver/regionserver_gc.log -XX:SurvivorRatio=1 -XX:+UseCMSCompactAtFullCollection -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled -XX:+UseNUMA -XX:+CMSClassUnloadingEnabled -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:CMSMaxAbortablePrecleanTime=10000 -XX:MaxGCPauseMillis=2000 -XX:TargetSurvivorRatio=80 -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m -XX:CMSWaitDuration=2000 -XX:+CMSScavengeBeforeRemark -XX:+PrintClassHistogramAfterFullGC -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintPromotionFailure -XX:ConcGCThreads=8 -XX:ParallelGCThreads=8 -XX:PretenureSizeThreshold=4m -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent I'm not a vm developer, so it's a suboptimal setting definitely, please do not apply it into product env directly w/o any testing with your data model. Any comments are welcome:) Liang, ________________________________________ 发件人: Azury [[EMAIL PROTECTED]] 发送时间: 2012年12月11日 14:47 收件人: [EMAIL PROTECTED] 主题: Re:Re: 答复: Multiple regionservers on a single node Can you share your GC command options here? 在 2012-12-11 06:21:08,"Adrien Mogenet" <[EMAIL PROTECTED]> 写道: >On Fri, Dec 7, 2012 at 10:58 AM, 谢良 <[EMAIL PROTECTED]> wrote: > >> Emm, have you tried to tune your GC deeply? please provide the exactly VM >> options and jdk version and GC logs.. >> In our test cluster this week, i managed to reduce the longest STW from >> 22+ seconds(Xmx20G) to 1.1s(Xmx48G) under a very heavy YCSB stress >> long-term-testing. >> > >Do you have any further explanation on your specific case ? Looks >interesting :-) > > >> >> Also it would be better to ask help from hotspot-gc-use/hotspot-gc-dev >> mail list:) >> And the G1GC within jdk7u4+ is a potential solution for large-heap senario >> as well:) >> ________________________________________ >> > On Mon, Dec 3, 2012 at 3:39 PM, Ishan Chhabra <[EMAIL PROTECTED] >> > >wrote: >> > >> > > Hi, >> > > Has anybody tried to run multiple RegionServers on a single physical >> > > node? Are there deep technical issues or minor impediments that would >> > > hinder this? >> > > >> > > We are trying to do this because we are facing a lot of GC pauses on >> the >> > > large heap sizes (~70G) that we are using, which leads to a lot of >> > timeouts >> > > in our latency critical application. More processes with smaller heaps >> > > would help in mitigating this issue. >> > > >> > > Any experience or thoughts on this would help. >> > > Thanks! >> > > >> > > -- >> > > *Ishan Chhabra *| Rocket Scientist | Rocketfuel Inc. | *m *650 556 >> 6803 >> > > >> > >> > >> > >> > -- >> > >> > Robert Dyer >> > [EMAIL PROTECTED] >> > >> > > > >-- >Adrien Mogenet >06.59.16.64.22 >http://www.mogenet.me
-
Re: 答复: Re:Re: 答复: Multiple regionservers on a single nodeIshan Chhabra 2012-12-11, 12:16
Hi Xieliang,
You have put in an interesting set of GC optimizations, similar to what I concluded after extensive GC tuning recently. For latency critical applications running on modern servers with large rams and multicore CPUs, the key seems to be in minimizing stop the world causes cause by Young GC, CMS initial-mark and CMS remark. Your GC options seems to capture that very well. Thanks for sharing! On Tue, Dec 11, 2012 at 12:42 AM, 谢良 <[EMAIL PROTECTED]> wrote: > Sure, here it is : > -Xmx49152m -Xms49152m -Xmn1024m -Xss256k -XX:MaxDirectMemorySize=1024m > -XX:MaxPermSize=512m -XX:PermSize=512m -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/home/work/log/hbase/ggsrv-miliao/regionserver > -XX:+PrintGCApplicationStoppedTime -XX:+UseConcMarkSweepGC -verbose:gc > -XX:+PrintGCDetails -XX:+PrintGCDateStamps > -Xloggc:/home/work/log/hbase/ggsrv-miliao/regionserver/regionserver_gc.log > -XX:SurvivorRatio=1 -XX:+UseCMSCompactAtFullCollection > -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly > -XX:+CMSParallelRemarkEnabled -XX:+UseNUMA -XX:+CMSClassUnloadingEnabled > -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 > -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution > -XX:CMSMaxAbortablePrecleanTime=10000 -XX:MaxGCPauseMillis=2000 > -XX:TargetSurvivorRatio=80 -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m -XX:CMSWaitDuration=2000 > -XX:+CMSScavengeBeforeRemark -XX:+PrintClassHistogramAfterFullGC > -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintPromotionFailure > -XX:ConcGCThreads=8 -XX:ParallelGCThreads=8 -XX:PretenureSizeThreshold=4m > -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent > > I'm not a vm developer, so it's a suboptimal setting definitely, please do > not apply it into product env directly w/o any testing with your data > model. Any comments are welcome:) > > Liang, > ________________________________________ > 发件人: Azury [[EMAIL PROTECTED]] > 发送时间: 2012年12月11日 14:47 > 收件人: [EMAIL PROTECTED] > 主题: Re:Re: 答复: Multiple regionservers on a single node > > Can you share your GC command options here? > > > > > > > > > 在 2012-12-11 06:21:08,"Adrien Mogenet" <[EMAIL PROTECTED]> 写道: > >On Fri, Dec 7, 2012 at 10:58 AM, 谢良 <[EMAIL PROTECTED]> wrote: > > > >> Emm, have you tried to tune your GC deeply? please provide the exactly > VM > >> options and jdk version and GC logs.. > >> In our test cluster this week, i managed to reduce the longest STW from > >> 22+ seconds(Xmx20G) to 1.1s(Xmx48G) under a very heavy YCSB stress > >> long-term-testing. > >> > > > >Do you have any further explanation on your specific case ? Looks > >interesting :-) > > > > > >> > >> Also it would be better to ask help from hotspot-gc-use/hotspot-gc-dev > >> mail list:) > >> And the G1GC within jdk7u4+ is a potential solution for large-heap > senario > >> as well:) > >> ________________________________________ > >> > On Mon, Dec 3, 2012 at 3:39 PM, Ishan Chhabra < > [EMAIL PROTECTED] > >> > >wrote: > >> > > >> > > Hi, > >> > > Has anybody tried to run multiple RegionServers on a single physical > >> > > node? Are there deep technical issues or minor impediments that > would > >> > > hinder this? > >> > > > >> > > We are trying to do this because we are facing a lot of GC pauses on > >> the > >> > > large heap sizes (~70G) that we are using, which leads to a lot of > >> > timeouts > >> > > in our latency critical application. More processes with smaller > heaps > >> > > would help in mitigating this issue. > >> > > > >> > > Any experience or thoughts on this would help. > >> > > Thanks! > >> > > > >> > > -- > >> > > *Ishan Chhabra *| Rocket Scientist | Rocketfuel Inc. | *m *650 556 > >> 6803 > >> > > > >> > > >> > > >> > > >> > -- > >> > > >> > Robert Dyer > >> > [EMAIL PROTECTED] > >> > > >> > > > > > > > >-- > >Adrien Mogenet > >06.59.16.64.22 > >http://www.mogenet.me > -- *Ishan Chhabra *| Rocket Scientist | +91-9988263562 *m* |