|
Pablo Musa
2013-03-08, 15:44
Ted Yu
2013-03-08, 16:01
ramkrishna vasudevan
2013-03-08, 16:32
Stack
2013-03-08, 17:11
Pablo Musa
2013-03-08, 18:58
Stack
2013-03-08, 22:02
Pablo Musa
2013-03-10, 18:59
Sreepathi
2013-03-10, 19:06
Pablo Musa
2013-03-10, 22:29
Stack
2013-03-10, 22:41
Azuryy Yu
2013-03-11, 02:13
Azuryy Yu
2013-03-11, 02:14
Andrew Purtell
2013-03-11, 02:24
Pablo Musa
2013-03-12, 15:43
Pablo Musa
2013-04-03, 18:21
Ted Yu
2013-04-03, 18:36
Pablo Musa
2013-04-03, 20:24
Ted Yu
2013-04-03, 21:40
|
-
RegionServers Crashing every hour in production envPablo Musa 2013-03-08, 15:44
Hey guys,
as I sent in an email a long time ago, the RSs in my cluster did not get along and crashed 3 times a day. I tried a lot of options we discussed in the emails, but it not solved the problem. As I used an old version of hadoop I thought this was the problem. So, I upgraded from hadoop 0.20 - hbase 0.90 - zookeeper 3.3.5 to hadoop 2.0.0 - hbase 0.94 - zookeeper 3.4.5. Unfortunately the RSs did not stop crashing, and worst! Now they crash every hour and some times when the RS that holds the .ROOT. crashes all cluster get stuck in transition and everything stops working. In this case I need to clean zookeeper znodes, restart the master and the RSs. To avoid this case I am running on production with only ONE RS and a monitoring script that check every minute, if the RS is ok. If not, restart it. * This case does not get the cluster stuck. This is driving me crazy, but I really cant find a solution for the cluster. I tracked all logs from the start time 16:49 from all interesting nodes (zoo, namenode, master, rs, dn2, dn9, dn10) and copied here what I think is usefull. There are some strange errors in the DATANODE2, as an error copiyng a block to itself. The gc log points to GC timeout. However it is very weird that the RS spend so much time in GC while in the other cases it takes 0.001sec. Besides, the time spent, is in sys which makes me think that might be a problem in another place. I know that it is a bunch of logs, and that it is very difficult to find the problem without much context. But I REALLY need some help. If it is not the solution, at least what I should read, where I should look, or which cases I should monitor. Thank you very much, Pablo Musa Start RS at 16:49 ####ZOOKEEPER-PSLBHNN001#### 2013-03-07 16:49:04,369 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /172.17.2.18:33549 2013-03-07 16:49:04,372 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@839] - Client attempting to establish new session at /172.17.2.18:33549 2013-03-07 16:49:04,373 [myid:] - INFO [SyncThread:0:ZooKeeperServer@595] - Established session 0x13d3c4bcba600a7 with negotiated timeout 150000 for client /172.17.2.18:33549 2013-03-07 16:49:04,458 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x13d3c4bcba600a7 type:create cxid:0x8 zxid:0x11010ade3a txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired 2013-03-07 16:49:09,036 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /172.17.2.18:33675 2013-03-07 16:49:09,036 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@839] - Client attempting to establish new session at /172.17.2.18:33675 2013-03-07 16:49:09,041 [myid:] - INFO [SyncThread:0:ZooKeeperServer@595] - Established session 0x13d3c4bcba600a8 with negotiated timeout 150000 for client /172.17.2.18:33675 2013-03-07 17:24:50,001 [myid:] - INFO [SessionTracker:ZooKeeperServer@325] - Expiring session 0x13d3c4bcba600a7, timeout of 150000ms exceeded 2013-03-07 17:24:50,001 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x13d3c4bcba600a7 2013-03-07 17:24:50,002 [myid:] - INFO [SyncThread:0:NIOServerCnxn@1001] - Closed socket connection for client /172.17.2.18:33549 which had sessionid 0x13d3c4bcba600a7 2013-03-07 17:24:57,889 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x13d3c4bcba600a8, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:722) 2013-03-07 17:24:57,890 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /172.17.2.18:33675 which had sessionid 0x13d3c4bcba600a8 2013-03-07 17:25:00,001 [myid:] - INFO [SessionTracker:ZooKeeperServer@325] - Expiring session 0x13d3c4bcba600a8, timeout of 150000ms exceeded 2013-03-07 17:25:00,001 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x13d3c4bcba600a8 2013-03-07 17:26:03,211 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /172.17.2.18:35983 2013-03-07 17:26:03,214 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@839] - Client attempting to establish new session at /172.17.2.18:35983 2013-03-07 17:26:03,215 [myid:] - INFO [SyncThread:0:ZooKeeperServer@595] - Established session 0x13d3c4bcba600a9 with negotiated timeout 150000 for client /172.17.2.18:35983 ####MASTER-PSLBHNN001#### 2013-03-07 16:49:00,379 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 unassigned = 1 2013-03-07 16:49:04,517 DEBUG org.apache.hadoop.hbase.master.ServerManager: STARTUP: Server PSLBHDN002,60020,1362685744154 came back up, removed it from the dead servers list 2013-03-07 16:49:04,517 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=PSLBHDN002,60020,1362685744154 2013-03-07 16:49:05,380 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 unassigned = 1 2013-03-07 16:49:05,483 INFO org.apache.hadoop.hbase.master.SplitLogManager: task /hbase/splitlog/hdfs%3A%2F%2Fpslbhnn001.psafe.net%2Fhbase%2F.logs%2FPSLBHDN002%2C60020%2C1362683643918-splitting%2FPSLBHDN002%252C60020%252C1362683643918.1362683644662 acquired by PSLBHDN002,60020,1362685744154 2013-03-07 16:49
-
Re: RegionServers Crashing every hour in production envTed Yu 2013-03-08, 16:01
0.94 currently doesn't support hadoop 2.0
Can you deploy hadoop 1.1.1 instead ? Are you using 0.94.5 ? Thanks On Fri, Mar 8, 2013 at 7:44 AM, Pablo Musa <[EMAIL PROTECTED]> wrote: > Hey guys, > as I sent in an email a long time ago, the RSs in my cluster did not get > along > and crashed 3 times a day. I tried a lot of options we discussed in the > emails, but it not solved the problem. As I used an old version of hadoop I > thought this was the problem. > > So, I upgraded from hadoop 0.20 - hbase 0.90 - zookeeper 3.3.5 to hadoop > 2.0.0 > - hbase 0.94 - zookeeper 3.4.5. > > Unfortunately the RSs did not stop crashing, and worst! Now they crash > every > hour and some times when the RS that holds the .ROOT. crashes all cluster > get > stuck in transition and everything stops working. > In this case I need to clean zookeeper znodes, restart the master and the > RSs. > To avoid this case I am running on production with only ONE RS and a > monitoring > script that check every minute, if the RS is ok. If not, restart it. > * This case does not get the cluster stuck. > > This is driving me crazy, but I really cant find a solution for the > cluster. > I tracked all logs from the start time 16:49 from all interesting nodes > (zoo, > namenode, master, rs, dn2, dn9, dn10) and copied here what I think is > usefull. > > There are some strange errors in the DATANODE2, as an error copiyng a block > to itself. > > The gc log points to GC timeout. However it is very weird that the RS spend > so much time in GC while in the other cases it takes 0.001sec. Besides, > the time > spent, is in sys which makes me think that might be a problem in another > place. > > I know that it is a bunch of logs, and that it is very difficult to find > the > problem without much context. But I REALLY need some help. If it is not the > solution, at least what I should read, where I should look, or which cases > I > should monitor. > > Thank you very much, > Pablo Musa >
-
Re: RegionServers Crashing every hour in production envramkrishna vasudevan 2013-03-08, 16:32
I think it is with your GC config. What is your heap size? What is the
data that you pump in and how much is the block cache size? Regards Ram On Fri, Mar 8, 2013 at 9:31 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > 0.94 currently doesn't support hadoop 2.0 > > Can you deploy hadoop 1.1.1 instead ? > > Are you using 0.94.5 ? > > Thanks > > On Fri, Mar 8, 2013 at 7:44 AM, Pablo Musa <[EMAIL PROTECTED]> wrote: > > > Hey guys, > > as I sent in an email a long time ago, the RSs in my cluster did not get > > along > > and crashed 3 times a day. I tried a lot of options we discussed in the > > emails, but it not solved the problem. As I used an old version of > hadoop I > > thought this was the problem. > > > > So, I upgraded from hadoop 0.20 - hbase 0.90 - zookeeper 3.3.5 to hadoop > > 2.0.0 > > - hbase 0.94 - zookeeper 3.4.5. > > > > Unfortunately the RSs did not stop crashing, and worst! Now they crash > > every > > hour and some times when the RS that holds the .ROOT. crashes all cluster > > get > > stuck in transition and everything stops working. > > In this case I need to clean zookeeper znodes, restart the master and the > > RSs. > > To avoid this case I am running on production with only ONE RS and a > > monitoring > > script that check every minute, if the RS is ok. If not, restart it. > > * This case does not get the cluster stuck. > > > > This is driving me crazy, but I really cant find a solution for the > > cluster. > > I tracked all logs from the start time 16:49 from all interesting nodes > > (zoo, > > namenode, master, rs, dn2, dn9, dn10) and copied here what I think is > > usefull. > > > > There are some strange errors in the DATANODE2, as an error copiyng a > block > > to itself. > > > > The gc log points to GC timeout. However it is very weird that the RS > spend > > so much time in GC while in the other cases it takes 0.001sec. Besides, > > the time > > spent, is in sys which makes me think that might be a problem in another > > place. > > > > I know that it is a bunch of logs, and that it is very difficult to find > > the > > problem without much context. But I REALLY need some help. If it is not > the > > solution, at least what I should read, where I should look, or which > cases > > I > > should monitor. > > > > Thank you very much, > > Pablo Musa > > >
-
Re: RegionServers Crashing every hour in production envStack 2013-03-08, 17:11
What RAM says.
2013-03-07 17:24:57,887 INFO org.apache.zookeeper.**ClientCnxn: Client session timed out, have not heard from server in 159348ms for sessionid 0x13d3c4bcba600a7, closing socket connection and attempting reconnect You Full GC'ing around this time? Put up your configs in a place where we can take a look? St.Ack On Fri, Mar 8, 2013 at 8:32 AM, ramkrishna vasudevan < [EMAIL PROTECTED]> wrote: > I think it is with your GC config. What is your heap size? What is the > data that you pump in and how much is the block cache size? > > Regards > Ram > > On Fri, Mar 8, 2013 at 9:31 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > 0.94 currently doesn't support hadoop 2.0 > > > > Can you deploy hadoop 1.1.1 instead ? > > > > Are you using 0.94.5 ? > > > > Thanks > > > > On Fri, Mar 8, 2013 at 7:44 AM, Pablo Musa <[EMAIL PROTECTED]> wrote: > > > > > Hey guys, > > > as I sent in an email a long time ago, the RSs in my cluster did not > get > > > along > > > and crashed 3 times a day. I tried a lot of options we discussed in the > > > emails, but it not solved the problem. As I used an old version of > > hadoop I > > > thought this was the problem. > > > > > > So, I upgraded from hadoop 0.20 - hbase 0.90 - zookeeper 3.3.5 to > hadoop > > > 2.0.0 > > > - hbase 0.94 - zookeeper 3.4.5. > > > > > > Unfortunately the RSs did not stop crashing, and worst! Now they crash > > > every > > > hour and some times when the RS that holds the .ROOT. crashes all > cluster > > > get > > > stuck in transition and everything stops working. > > > In this case I need to clean zookeeper znodes, restart the master and > the > > > RSs. > > > To avoid this case I am running on production with only ONE RS and a > > > monitoring > > > script that check every minute, if the RS is ok. If not, restart it. > > > * This case does not get the cluster stuck. > > > > > > This is driving me crazy, but I really cant find a solution for the > > > cluster. > > > I tracked all logs from the start time 16:49 from all interesting nodes > > > (zoo, > > > namenode, master, rs, dn2, dn9, dn10) and copied here what I think is > > > usefull. > > > > > > There are some strange errors in the DATANODE2, as an error copiyng a > > block > > > to itself. > > > > > > The gc log points to GC timeout. However it is very weird that the RS > > spend > > > so much time in GC while in the other cases it takes 0.001sec. Besides, > > > the time > > > spent, is in sys which makes me think that might be a problem in > another > > > place. > > > > > > I know that it is a bunch of logs, and that it is very difficult to > find > > > the > > > problem without much context. But I REALLY need some help. If it is not > > the > > > solution, at least what I should read, where I should look, or which > > cases > > > I > > > should monitor. > > > > > > Thank you very much, > > > Pablo Musa > > > > > >
-
Re: RegionServers Crashing every hour in production envPablo Musa 2013-03-08, 18:58
> 0.94 currently doesn't support hadoop 2.0
> Can you deploy hadoop 1.1.1 instead ? I am using cdh4.2.0 which uses this version as default installation. I think it will be a problem for me to deploy 1.1.1 because I would need to "upgrade" the whole cluster with 70TB of data (backup everything, go offline, etc.). Is there a problem to use cdh4.2.0? I should send my email to cdh list? > Are you using 0.94.5 ? I am using 0.94.2. > I think it is with your GC config. What is your heap size? What is the > data that you pump in and how much is the block cache size? #JVM config: export HBASE_OPTS="-XX:NewSize=64m -XX:MaxNewSize=64m -XX:+UseConcMarkSweepGC -XX:MaxDirectMemorySize=2G -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/var/logs/hbase/gc-hbase.log" # heap size export HBASE_HEAPSIZE=8192 #hbase metrics requestsPerSecond=8, numberOfOnlineRegions=1252, numberOfStores=1272, numberOfStorefiles=1651, storefileIndexSizeMB=66, rootIndexSizeKB=68176, totalStaticIndexSizeKB=55028, totalStaticBloomSizeKB=0, memstoreSizeMB=3, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0, readRequestsCount=1176287, writeRequestsCount=2165, compactionQueueSize=0, flushQueueSize=0, usedHeapMB=328, maxHeapMB=8185, blockCacheSizeMB=117.94, blockCacheFreeMB=1928.47, blockCacheCount=2083, blockCacheHitCount=34815, blockCacheMissCount=10259, blockCacheEvictedCount=17, blockCacheHitRatio=77%, blockCacheHitCachingRatio=94%, hdfsBlocksLocalityIndex=65, slowHLogAppendCount=0, fsReadLatencyHistogramMean=0, fsReadLatencyHistogramCount=0, fsReadLatencyHistogramMedian=0, fsReadLatencyHistogram75th=0, fsReadLatencyHistogram95th=0, fsReadLatencyHistogram99th=0, fsReadLatencyHistogram999th=0, fsPreadLatencyHistogramMean=0, fsPreadLatencyHistogramCount=0, fsPreadLatencyHistogramMedian=0, fsPreadLatencyHistogram75th=0, fsPreadLatencyHistogram95th=0, fsPreadLatencyHistogram99th=0, fsPreadLatencyHistogram999th=0, fsWriteLatencyHistogramMean=0, fsWriteLatencyHistogramCount=0, fsWriteLatencyHistogramMedian=0, fsWriteLatencyHistogram75th=0, fsWriteLatencyHistogram95th=0, fsWriteLatencyHistogram99th=0, fsWriteLatencyHistogram999th=0 #hbase-site.xml <property> <name>hbase.hregion.memstore.mslab.enabled</name> <value>true</value> </property> <property> <name>hbase.regionserver.handler.count</name> <value>20</value> </property> All the other parameters I am using are default, both hbase and hadoop. Four tables with this same configuration. {NAME => 'T1', FAMILIES => [{NAME => 'details', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} Rows from one table can vary from 4kb to 50kb while rows from the other 3 usually vary from 60 bytes to 300 bytes. > You Full GC'ing around this time? The GC shows it took a long time. However it does not make any sense to be it, since the same ammount of data was cleaned before and AFTER in just 0.01 secs! [Times: user=0.08 sys=137.62, real=137.62 secs] Besides the whole time was used by system. That is what is bugging me. ... 1044.081: [GC 1044.081: [ParNew: 58970K->402K(59008K), 0.0040990 secs] 275097K->216577K(1152704K), 0.0041820 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 1087.319: [GC 1087.319: [ParNew: 52873K->6528K(59008K), 0.0055000 secs] 269048K->223592K(1152704K), 0.0055930 secs] [Times: user=0.04 sys=0.01, real=0.00 secs] 1087.834: [GC 1087.834: [ParNew: 59008K->6527K(59008K), 137.6353620 secs] 276072K->235097K(1152704K), 137.6354700 secs] [Times: user=0.08 sys=137.62, real=137.62 secs] 1226.638: [GC 1226.638: [ParNew: 59007K->1897K(59008K), 0.0079960 secs] 287577K->230937K(1152704K), 0.0080770 secs] [Times: user=0.05 sys=0.00, real=0.01 secs] 1227.251: [GC 1227.251: [ParNew: 54377K->2379K(59008K), 0.0095650 secs] 283417K->231420K(1152704K), 0.0096340 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] I really appreciate you guys helping me to find out what is wrong. Thanks, Pablo On 03/08/2013 02:11 PM, Stack wrote:
-
Re: RegionServers Crashing every hour in production envStack 2013-03-08, 22:02
On Fri, Mar 8, 2013 at 10:58 AM, Pablo Musa <[EMAIL PROTECTED]> wrote:
> 0.94 currently doesn't support hadoop 2.0 >> Can you deploy hadoop 1.1.1 instead ? >> > > I am using cdh4.2.0 which uses this version as default installation. > I think it will be a problem for me to deploy 1.1.1 because I would need to > "upgrade" the whole cluster with 70TB of data (backup everything, go > offline, etc.). > > Is there a problem to use cdh4.2.0? > I should send my email to cdh list? > > That combo should be fine. > You Full GC'ing around this time? >> > > The GC shows it took a long time. However it does not make any sense > to be it, since the same ammount of data was cleaned before and AFTER > in just 0.01 secs! > > If JVM is full GC'ing, the application is stopped. > > [Times: user=0.08 sys=137.62, real=137.62 secs] > > Besides the whole time was used by system. That is what is bugging me. > > The below does not look like a full GC but that is a long pause in system time, enough to kill your zk session. You swapping? Hardware is good? St.Ack > ... > > > 1044.081: [GC 1044.081: [ParNew: 58970K->402K(59008K), 0.0040990 secs] > 275097K->216577K(1152704K), 0.0041820 secs] [Times: user=0.03 sys=0.00, > real=0.01 secs] > > 1087.319: [GC 1087.319: [ParNew: 52873K->6528K(59008K), 0.0055000 secs] > 269048K->223592K(1152704K), 0.0055930 secs] [Times: user=0.04 sys=0.01, > real=0.00 secs] > > 1087.834: [GC 1087.834: [ParNew: 59008K->6527K(59008K), 137.6353620 > secs] 276072K->235097K(1152704K), 137.6354700 secs] [Times: user=0.08 > sys=137.62, real=137.62 secs] > > 1226.638: [GC 1226.638: [ParNew: 59007K->1897K(59008K), 0.0079960 secs] > 287577K->230937K(1152704K), 0.0080770 secs] [Times: user=0.05 sys=0.00, > real=0.01 secs] > > 1227.251: [GC 1227.251: [ParNew: 54377K->2379K(59008K), 0.0095650 secs] > 283417K->231420K(1152704K), 0.0096340 secs] [Times: user=0.06 sys=0.00, > real=0.01 secs] > > > I really appreciate you guys helping me to find out what is wrong. > > Thanks, > Pablo > > > > On 03/08/2013 02:11 PM, Stack wrote: > >> What RAM says. >> >> 2013-03-07 17:24:57,887 INFO org.apache.zookeeper.****ClientCnxn: Client >> >> session timed out, have not heard from server in 159348ms for sessionid >> 0x13d3c4bcba600a7, closing socket connection and attempting reconnect >> >> You Full GC'ing around this time? >> >> Put up your configs in a place where we can take a look? >> >> St.Ack >> >> >> On Fri, Mar 8, 2013 at 8:32 AM, ramkrishna vasudevan < >> ramkrishna.s.vasudevan@gmail.**com <[EMAIL PROTECTED]>> >> wrote: >> >> I think it is with your GC config. What is your heap size? What is the >>> data that you pump in and how much is the block cache size? >>> >>> Regards >>> Ram >>> >>> On Fri, Mar 8, 2013 at 9:31 PM, Ted Yu <[EMAIL PROTECTED]> wrote: >>> >>> 0.94 currently doesn't support hadoop 2.0 >>>> >>>> Can you deploy hadoop 1.1.1 instead ? >>>> >>>> Are you using 0.94.5 ? >>>> >>>> Thanks >>>> >>>> On Fri, Mar 8, 2013 at 7:44 AM, Pablo Musa <[EMAIL PROTECTED]> wrote: >>>> >>>> Hey guys, >>>>> as I sent in an email a long time ago, the RSs in my cluster did not >>>>> >>>> get >>> >>>> along >>>>> and crashed 3 times a day. I tried a lot of options we discussed in the >>>>> emails, but it not solved the problem. As I used an old version of >>>>> >>>> hadoop I >>>> >>>>> thought this was the problem. >>>>> >>>>> So, I upgraded from hadoop 0.20 - hbase 0.90 - zookeeper 3.3.5 to >>>>> >>>> hadoop >>> >>>> 2.0.0 >>>>> - hbase 0.94 - zookeeper 3.4.5. >>>>> >>>>> Unfortunately the RSs did not stop crashing, and worst! Now they crash >>>>> every >>>>> hour and some times when the RS that holds the .ROOT. crashes all >>>>> >>>> cluster >>> >>>> get >>>>> stuck in transition and everything stops working. >>>>> In this case I need to clean zookeeper znodes, restart the master and >>>>> >>>> the >>> >>>> RSs. >>>>> To avoid this case I am running on production with only ONE RS and a >>>>> monitoring >>>>> script that check every minute, if the RS is ok. If not, restart it.
-
Re: RegionServers Crashing every hour in production envPablo Musa 2013-03-10, 18:59
> That combo should be fine.
Great!! > If JVM is full GC'ing, the application is stopped. > The below does not look like a full GC but that is a long pause in system > time, enough to kill your zk session. Exactly. This pause is really making the zk expire the RS which shutsdown (logs in the end of the email). But the question is: what is causing this pause??!! > You swapping? I don't think so (stats below). > Hardware is good? Yes, it is a 16 processor machine with 74GB of RAM and plenty disk space. Below are some metrics I have heard about. Hope it helps. ** I am having some problems with the datanodes[1] which are having trouble to write. I really think the issues are related, but cannot solve any of them :( Thanks again, Pablo [1] http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201303.mbox/%3CCAJzooYfS-F1KS+[EMAIL PROTECTED]%3E top - 15:38:04 up 297 days, 21:03, 2 users, load average: 4.34, 2.55, 1.28 Tasks: 528 total, 1 running, 527 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 74187256k total, 29493992k used, 44693264k free, 5836576k buffers Swap: 51609592k total, 128312k used, 51481280k free, 1353400k cached ]$ vmstat -w procs -------------------memory------------------ ---swap-- -----io---- r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 128312 32416928 5838288 5043560 0 0 202 53 0 0 2 1 96 1 0 ]$ sar 02:20:01 PM all 26.18 0.00 2.90 0.63 0.00 70.29 02:30:01 PM all 1.66 0.00 1.25 1.05 0.00 96.04 02:40:01 PM all 10.01 0.00 2.14 0.75 0.00 87.11 02:50:01 PM all 0.76 0.00 0.80 1.03 0.00 97.40 03:00:01 PM all 0.23 0.00 0.30 0.71 0.00 98.76 03:10:01 PM all 0.22 0.00 0.30 0.66 0.00 98.82 03:20:01 PM all 0.22 0.00 0.31 0.76 0.00 98.71 03:30:01 PM all 0.24 0.00 0.31 0.64 0.00 98.81 03:40:01 PM all 1.13 0.00 2.97 1.18 0.00 94.73 Average: all 3.86 0.00 1.38 0.88 0.00 93.87 ]$ iostat Linux 2.6.32-220.7.1.el6.x86_64 (PSLBHDN002) 03/10/2013 _x86_64_ (16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 1.86 0.00 0.96 0.78 0.00 96.41 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 1.23 20.26 23.53 521533196 605566924 sdb 6.51 921.55 241.90 23717850730 6225863488 sdc 6.22 921.83 236.41 23725181162 6084471192 sdd 6.25 925.13 237.26 23810004970 6106357880 sde 6.19 913.90 235.60 23521108818 6063722504 sdh 6.26 933.08 237.77 24014594546 6119511376 sdg 6.18 914.36 235.31 23532747378 6056257016 sdf 6.24 923.66 235.33 23772251810 6056604008 Some more logging which reinforce that the RS crash is happening because of timeout. However this time the GC log is not accusing a big time. #####RS LOG##### 2013-03-10 15:37:46,712 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 257739ms for sessionid 0x13d3c4bcba6014a, closing socket connection and attempting reconnect 2013-03-10 15:37:46,712 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 226785ms for sessionid 0x13d3c4bcba60149, closing socket connection and attempting reconnect 2013-03-10 15:37:46,712 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=61.91 MB, free=1.94 GB, max=2 GB, blocks=1254, accesses=60087, hits=58811, hitRatio=97.87%, , cachingAccesses=60069, cachingHits=58811, cachingHitsRatio=97.90%, , evictions=0, evicted=0, evictedPerRun=NaN 2013-03-10 15:37:46,712 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 225100ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired 2013-03-10 15:37:46,714 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-43236042-172.17.2.10-1362490844340:blk_-6834190810033122569_25150229 java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:670) 2013-03-10 15:37:46,716 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call get([B@7caf69ed, {"timeRange":[0,9223372036854775807],"totalColumns":1,"cacheBlocks":true,"families":{"details":["ALL"]},"maxVersions":1,"row":"\\x00\\x00\\x00\\x00\\x00\\x12\\x93@"}), rpc version=1, client version=29, methodsFingerPrint=1891768260 from 172.17.1.71:51294 after 0 ms, since caller disconnected #####GC LOG##### 2716.635: [GC 2716.635: [ParNew: 57570K->746K(59008K), 0.0046530 secs] 354857K->300891K(1152704K), 0.0047370 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] 2789.478: [GC 2789.478: [ParNew: 53226K->1192K(59008K), 0.0041370 secs] 353371K->301337K(1152704K), 0.0042220 secs] [Times: user=0.04 sys=0.00, real=0.00 secs] 2868.435: [GC 2868.435: [ParNew: 53672K->740K(59008K), 0.0041570 secs] 353817K->300886K(1152704K), 0.0042440 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 2920.309: [GC 2920.309: [ParNew: 53220K->6202K(59008K), 0.0058440 secs] 353366K->310443K(1152704K), 0.0059410 secs] [Times: user=0.05 sys=0.00, real=0
-
Re: RegionServers Crashing every hour in production envSreepathi 2013-03-10, 19:06
Hi Stack/Ted/Pablo,
Should we increase the hbase.rpc.timeout property to 5 minutes ( 300000 ms ) ? Regards, - Sreepathi On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa <[EMAIL PROTECTED]> wrote: > > That combo should be fine. > > Great!! > > > > If JVM is full GC'ing, the application is stopped. > > The below does not look like a full GC but that is a long pause in system > > time, enough to kill your zk session. > > Exactly. This pause is really making the zk expire the RS which shutsdown > (logs > in the end of the email). > But the question is: what is causing this pause??!! > > > You swapping? > > I don't think so (stats below). > > > Hardware is good? > > Yes, it is a 16 processor machine with 74GB of RAM and plenty disk space. > Below are some metrics I have heard about. Hope it helps. > > > ** I am having some problems with the datanodes[1] which are having > trouble to > write. I really think the issues are related, but cannot solve any of them > :( > > Thanks again, > Pablo > > [1] http://mail-archives.apache.**org/mod_mbox/hadoop-hdfs-user/** > 201303.mbox/%3CCAJzooYfS-F1KS+**jGOPUt15PwFjcCSzigE0APeM9FXaCr** > [EMAIL PROTECTED]%3E<http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201303.mbox/%3CCAJzooYfS-F1KS+[EMAIL PROTECTED]%3E> > > top - 15:38:04 up 297 days, 21:03, 2 users, load average: 4.34, 2.55, > 1.28 > Tasks: 528 total, 1 running, 527 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Mem: 74187256k total, 29493992k used, 44693264k free, 5836576k buffers > Swap: 51609592k total, 128312k used, 51481280k free, 1353400k cached > > ]$ vmstat -w > procs -------------------memory-----**------------- ---swap-- -----io---- > --system-- -----cpu------- > r b swpd free buff cache si so bi bo in > cs us sy id wa st > 2 0 128312 32416928 5838288 5043560 0 0 202 53 0 > 0 2 1 96 1 0 > > ]$ sar > 02:20:01 PM all 26.18 0.00 2.90 0.63 0.00 70.29 > 02:30:01 PM all 1.66 0.00 1.25 1.05 0.00 96.04 > 02:40:01 PM all 10.01 0.00 2.14 0.75 0.00 87.11 > 02:50:01 PM all 0.76 0.00 0.80 1.03 0.00 97.40 > 03:00:01 PM all 0.23 0.00 0.30 0.71 0.00 98.76 > 03:10:01 PM all 0.22 0.00 0.30 0.66 0.00 98.82 > 03:20:01 PM all 0.22 0.00 0.31 0.76 0.00 98.71 > 03:30:01 PM all 0.24 0.00 0.31 0.64 0.00 98.81 > 03:40:01 PM all 1.13 0.00 2.97 1.18 0.00 94.73 > Average: all 3.86 0.00 1.38 0.88 0.00 93.87 > > ]$ iostat > Linux 2.6.32-220.7.1.el6.x86_64 (PSLBHDN002) 03/10/2013 _x86_64_ > (16 CPU) > > avg-cpu: %user %nice %system %iowait %steal %idle > 1.86 0.00 0.96 0.78 0.00 96.41 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sda 1.23 20.26 23.53 521533196 605566924 > sdb 6.51 921.55 241.90 23717850730 6225863488 > sdc 6.22 921.83 236.41 23725181162 6084471192 > sdd 6.25 925.13 237.26 23810004970 6106357880 > sde 6.19 913.90 235.60 23521108818 6063722504 > sdh 6.26 933.08 237.77 24014594546 6119511376 > sdg 6.18 914.36 235.31 23532747378 6056257016 > sdf 6.24 923.66 235.33 23772251810 6056604008 > > Some more logging which reinforce that the RS crash is happening because of > timeout. However this time the GC log is not accusing a big time. > > #####RS LOG##### > 2013-03-10 15:37:46,712 INFO org.apache.zookeeper.**ClientCnxn: Client > session timed out, have not heard from server in 257739ms for sessionid > 0x13d3c4bcba6014a, closing socket connection and attempting reconnect *Regards,*
-
Re: RegionServers Crashing every hour in production envPablo Musa 2013-03-10, 22:29
Hi Sreepathi,
they say in the book (or the site), we could try it to see if it is really a timeout error or there is something more. But it is not recomended for production environments. I could give it a try if five minutes will ensure to us that the problem is the GC or elsewhere!! Anyway, I think it is hard to beleive a GC is taking 2:30 minutes. Abs, Pablo On 03/10/2013 04:06 PM, Sreepathi wrote: > Hi Stack/Ted/Pablo, > > Should we increase the hbase.rpc.timeout property to 5 minutes ( 300000 ms > ) ? > > Regards, > - Sreepathi > > On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa <[EMAIL PROTECTED]> wrote: > >>> That combo should be fine. >> Great!! >> >> >>> If JVM is full GC'ing, the application is stopped. >>> The below does not look like a full GC but that is a long pause in system >>> time, enough to kill your zk session. >> Exactly. This pause is really making the zk expire the RS which shutsdown >> (logs >> in the end of the email). >> But the question is: what is causing this pause??!! >> >>> You swapping? >> I don't think so (stats below). >> >>> Hardware is good? >> Yes, it is a 16 processor machine with 74GB of RAM and plenty disk space. >> Below are some metrics I have heard about. Hope it helps. >> >> >> ** I am having some problems with the datanodes[1] which are having >> trouble to >> write. I really think the issues are related, but cannot solve any of them >> :( >> >> Thanks again, >> Pablo >> >> [1] http://mail-archives.apache.**org/mod_mbox/hadoop-hdfs-user/** >> 201303.mbox/%3CCAJzooYfS-F1KS+**jGOPUt15PwFjcCSzigE0APeM9FXaCr** >> [EMAIL PROTECTED]%3E<http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201303.mbox/%3CCAJzooYfS-F1KS+[EMAIL PROTECTED]%3E> >> >> top - 15:38:04 up 297 days, 21:03, 2 users, load average: 4.34, 2.55, >> 1.28 >> Tasks: 528 total, 1 running, 527 sleeping, 0 stopped, 0 zombie >> Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, >> 0.0%st >> Mem: 74187256k total, 29493992k used, 44693264k free, 5836576k buffers >> Swap: 51609592k total, 128312k used, 51481280k free, 1353400k cached >> >> ]$ vmstat -w >> procs -------------------memory-----**------------- ---swap-- -----io---- >> --system-- -----cpu------- >> r b swpd free buff cache si so bi bo in >> cs us sy id wa st >> 2 0 128312 32416928 5838288 5043560 0 0 202 53 0 >> 0 2 1 96 1 0 >> >> ]$ sar >> 02:20:01 PM all 26.18 0.00 2.90 0.63 0.00 70.29 >> 02:30:01 PM all 1.66 0.00 1.25 1.05 0.00 96.04 >> 02:40:01 PM all 10.01 0.00 2.14 0.75 0.00 87.11 >> 02:50:01 PM all 0.76 0.00 0.80 1.03 0.00 97.40 >> 03:00:01 PM all 0.23 0.00 0.30 0.71 0.00 98.76 >> 03:10:01 PM all 0.22 0.00 0.30 0.66 0.00 98.82 >> 03:20:01 PM all 0.22 0.00 0.31 0.76 0.00 98.71 >> 03:30:01 PM all 0.24 0.00 0.31 0.64 0.00 98.81 >> 03:40:01 PM all 1.13 0.00 2.97 1.18 0.00 94.73 >> Average: all 3.86 0.00 1.38 0.88 0.00 93.87 >> >> ]$ iostat >> Linux 2.6.32-220.7.1.el6.x86_64 (PSLBHDN002) 03/10/2013 _x86_64_ >> (16 CPU) >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 1.86 0.00 0.96 0.78 0.00 96.41 >> >> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >> sda 1.23 20.26 23.53 521533196 605566924 >> sdb 6.51 921.55 241.90 23717850730 6225863488 >> sdc 6.22 921.83 236.41 23725181162 6084471192 >> sdd 6.25 925.13 237.26 23810004970 6106357880 >> sde 6.19 913.90 235.60 23521108818 6063722504 >> sdh 6.26 933.08 237.77 24014594546 6119511376
-
Re: RegionServers Crashing every hour in production envStack 2013-03-10, 22:41
You could increase your zookeeper session timeout to 5 minutes while you
are figuring why these long pauses. http://hbase.apache.org/book.html#zookeeper.session.timeout Above, there is an outage for almost 5 minutes: >> We slept 225100ms instead of 3000ms, this is likely due to a long You have ganglia or tsdb running? When you see the big pause above, can you see anything going on on the machine? (swap, iowait, concurrent fat mapreduce job?) St.Ack On Sun, Mar 10, 2013 at 3:29 PM, Pablo Musa <[EMAIL PROTECTED]> wrote: > Hi Sreepathi, > they say in the book (or the site), we could try it to see if it is really > a timeout error > or there is something more. But it is not recomended for production > environments. > > I could give it a try if five minutes will ensure to us that the problem > is the GC or > elsewhere!! Anyway, I think it is hard to beleive a GC is taking 2:30 > minutes. > > Abs, > Pablo > > > On 03/10/2013 04:06 PM, Sreepathi wrote: > >> Hi Stack/Ted/Pablo, >> >> Should we increase the hbase.rpc.timeout property to 5 minutes ( 300000 ms >> ) ? >> >> Regards, >> - Sreepathi >> >> On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa <[EMAIL PROTECTED]> wrote: >> >> That combo should be fine. >>>> >>> Great!! >>> >>> >>> If JVM is full GC'ing, the application is stopped. >>>> The below does not look like a full GC but that is a long pause in >>>> system >>>> time, enough to kill your zk session. >>>> >>> Exactly. This pause is really making the zk expire the RS which shutsdown >>> (logs >>> in the end of the email). >>> But the question is: what is causing this pause??!! >>> >>> You swapping? >>>> >>> I don't think so (stats below). >>> >>> Hardware is good? >>>> >>> Yes, it is a 16 processor machine with 74GB of RAM and plenty disk space. >>> Below are some metrics I have heard about. Hope it helps. >>> >>> >>> ** I am having some problems with the datanodes[1] which are having >>> trouble to >>> write. I really think the issues are related, but cannot solve any of >>> them >>> :( >>> >>> Thanks again, >>> Pablo >>> >>> [1] http://mail-archives.apache.****org/mod_mbox/hadoop-hdfs-user/**** >>> 201303.mbox/%3CCAJzooYfS-F1KS+******jGOPUt15PwFjcCSzigE0APeM9FXaCr**** >>> [EMAIL PROTECTED]%3E<http:**//mail-archives.apache.org/** >>> mod_mbox/hadoop-hdfs-user/**201303.mbox/%3CCAJzooYfS-F1KS+** >>> jGOPUt15PwFjcCSzigE0APeM9FXaCr**[EMAIL PROTECTED]%3E<http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201303.mbox/%3CCAJzooYfS-F1KS+[EMAIL PROTECTED]%3E> >>> > >>> >>> top - 15:38:04 up 297 days, 21:03, 2 users, load average: 4.34, 2.55, >>> 1.28 >>> Tasks: 528 total, 1 running, 527 sleeping, 0 stopped, 0 zombie >>> Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Mem: 74187256k total, 29493992k used, 44693264k free, 5836576k buffers >>> Swap: 51609592k total, 128312k used, 51481280k free, 1353400k cached >>> >>> ]$ vmstat -w >>> procs -------------------memory-----****------------- ---swap-- >>> -----io---- >>> --system-- -----cpu------- >>> r b swpd free buff cache si so bi bo >>> in >>> cs us sy id wa st >>> 2 0 128312 32416928 5838288 5043560 0 0 202 53 >>> 0 >>> 0 2 1 96 1 0 >>> >>> ]$ sar >>> 02:20:01 PM all 26.18 0.00 2.90 0.63 0.00 >>> 70.29 >>> 02:30:01 PM all 1.66 0.00 1.25 1.05 0.00 >>> 96.04 >>> 02:40:01 PM all 10.01 0.00 2.14 0.75 0.00 >>> 87.11 >>> 02:50:01 PM all 0.76 0.00 0.80 1.03 0.00 >>> 97.40 >>> 03:00:01 PM all 0.23 0.00 0.30 0.71 0.00 >>> 98.76 >>> 03:10:01 PM all 0.22 0.00 0.30 0.66 0.00 >>> 98.82 >>> 03:20:01 PM all 0.22 0.00 0.31 0.76 0.00 >>> 98.71 >>> 03:30:01 PM all 0.24 0.00 0.31 0.64 0.00 >>> 98.81 >>> 03:40:01 PM all 1.13 0.00 2.97 1.18 0.00
-
Re: RegionServers Crashing every hour in production envAzuryy Yu 2013-03-11, 02:13
Hi Pablo,
It'a terrible for a long minor GC. I don't think there are swaping from your vmstat log. but I just suggest you 1) add following JVM options: -XX:+DisableExplicitGC -XX:+UseCompressedOops -XX:GCTimeRatio=19 -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=2 -XX:MaxTenuringThreshold=3 -XX:+UseFastAccessorMethods 2) -Xmn is two small, your total Mem is 74GB, just make -Xmn2g 3) what are you doing during long GC happened? read or write? if reading, what the block cache size? On Mon, Mar 11, 2013 at 6:41 AM, Stack <[EMAIL PROTECTED]> wrote: > You could increase your zookeeper session timeout to 5 minutes while you > are figuring why these long pauses. > http://hbase.apache.org/book.html#zookeeper.session.timeout > > Above, there is an outage for almost 5 minutes: > > >> We slept 225100ms instead of 3000ms, this is likely due to a long > > You have ganglia or tsdb running? When you see the big pause above, can > you see anything going on on the machine? (swap, iowait, concurrent fat > mapreduce job?) > > St.Ack > > > > On Sun, Mar 10, 2013 at 3:29 PM, Pablo Musa <[EMAIL PROTECTED]> wrote: > > > Hi Sreepathi, > > they say in the book (or the site), we could try it to see if it is > really > > a timeout error > > or there is something more. But it is not recomended for production > > environments. > > > > I could give it a try if five minutes will ensure to us that the problem > > is the GC or > > elsewhere!! Anyway, I think it is hard to beleive a GC is taking 2:30 > > minutes. > > > > Abs, > > Pablo > > > > > > On 03/10/2013 04:06 PM, Sreepathi wrote: > > > >> Hi Stack/Ted/Pablo, > >> > >> Should we increase the hbase.rpc.timeout property to 5 minutes ( 300000 > ms > >> ) ? > >> > >> Regards, > >> - Sreepathi > >> > >> On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa <[EMAIL PROTECTED]> wrote: > >> > >> That combo should be fine. > >>>> > >>> Great!! > >>> > >>> > >>> If JVM is full GC'ing, the application is stopped. > >>>> The below does not look like a full GC but that is a long pause in > >>>> system > >>>> time, enough to kill your zk session. > >>>> > >>> Exactly. This pause is really making the zk expire the RS which > shutsdown > >>> (logs > >>> in the end of the email). > >>> But the question is: what is causing this pause??!! > >>> > >>> You swapping? > >>>> > >>> I don't think so (stats below). > >>> > >>> Hardware is good? > >>>> > >>> Yes, it is a 16 processor machine with 74GB of RAM and plenty disk > space. > >>> Below are some metrics I have heard about. Hope it helps. > >>> > >>> > >>> ** I am having some problems with the datanodes[1] which are having > >>> trouble to > >>> write. I really think the issues are related, but cannot solve any of > >>> them > >>> :( > >>> > >>> Thanks again, > >>> Pablo > >>> > >>> [1] http://mail-archives.apache.****org/mod_mbox/hadoop-hdfs-user/**** > >>> 201303.mbox/%3CCAJzooYfS-F1KS+******jGOPUt15PwFjcCSzigE0APeM9FXaCr**** > >>> [EMAIL PROTECTED]%3E<http:**//mail-archives.apache.org/** > >>> mod_mbox/hadoop-hdfs-user/**201303.mbox/%3CCAJzooYfS-F1KS+** > >>> jGOPUt15PwFjcCSzigE0APeM9FXaCr**[EMAIL PROTECTED]%3E< > http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201303.mbox/%3CCAJzooYfS-F1KS+[EMAIL PROTECTED]%3E > > > >>> > > >>> > >>> top - 15:38:04 up 297 days, 21:03, 2 users, load average: 4.34, 2.55, > >>> 1.28 > >>> Tasks: 528 total, 1 running, 527 sleeping, 0 stopped, 0 zombie > >>> Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, > >>> 0.0%st > >>> Mem: 74187256k total, 29493992k used, 44693264k free, 5836576k > buffers > >>> Swap: 51609592k total, 128312k used, 51481280k free, 1353400k cached > >>> > >>> ]$ vmstat -w > >>> procs -------------------memory-----****------------- ---swap-- > >>> -----io---- > >>> --system-- -----cpu------- > >>> r b swpd free buff cache si so bi bo > >>> in > >>> cs us sy id wa st > >>> 2 0 128312 32416928 5838288 5043560 0 0 202 53
-
Re: RegionServers Crashing every hour in production envAzuryy Yu 2013-03-11, 02:14
Pablo,
another, what's your java version? On Mon, Mar 11, 2013 at 10:13 AM, Azuryy Yu <[EMAIL PROTECTED]> wrote: > Hi Pablo, > It'a terrible for a long minor GC. I don't think there are swaping from > your vmstat log. > but I just suggest you > 1) add following JVM options: > -XX:+DisableExplicitGC -XX:+UseCompressedOops -XX:GCTimeRatio=19 > -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=2 > -XX:MaxTenuringThreshold=3 -XX:+UseFastAccessorMethods > > 2) -Xmn is two small, your total Mem is 74GB, just make -Xmn2g > 3) what are you doing during long GC happened? read or write? if reading, > what the block cache size? > > > > > On Mon, Mar 11, 2013 at 6:41 AM, Stack <[EMAIL PROTECTED]> wrote: > >> You could increase your zookeeper session timeout to 5 minutes while you >> are figuring why these long pauses. >> http://hbase.apache.org/book.html#zookeeper.session.timeout >> >> Above, there is an outage for almost 5 minutes: >> >> >> We slept 225100ms instead of 3000ms, this is likely due to a long >> >> You have ganglia or tsdb running? When you see the big pause above, can >> you see anything going on on the machine? (swap, iowait, concurrent fat >> mapreduce job?) >> >> St.Ack >> >> >> >> On Sun, Mar 10, 2013 at 3:29 PM, Pablo Musa <[EMAIL PROTECTED]> wrote: >> >> > Hi Sreepathi, >> > they say in the book (or the site), we could try it to see if it is >> really >> > a timeout error >> > or there is something more. But it is not recomended for production >> > environments. >> > >> > I could give it a try if five minutes will ensure to us that the problem >> > is the GC or >> > elsewhere!! Anyway, I think it is hard to beleive a GC is taking 2:30 >> > minutes. >> > >> > Abs, >> > Pablo >> > >> > >> > On 03/10/2013 04:06 PM, Sreepathi wrote: >> > >> >> Hi Stack/Ted/Pablo, >> >> >> >> Should we increase the hbase.rpc.timeout property to 5 minutes ( >> 300000 ms >> >> ) ? >> >> >> >> Regards, >> >> - Sreepathi >> >> >> >> On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa <[EMAIL PROTECTED]> wrote: >> >> >> >> That combo should be fine. >> >>>> >> >>> Great!! >> >>> >> >>> >> >>> If JVM is full GC'ing, the application is stopped. >> >>>> The below does not look like a full GC but that is a long pause in >> >>>> system >> >>>> time, enough to kill your zk session. >> >>>> >> >>> Exactly. This pause is really making the zk expire the RS which >> shutsdown >> >>> (logs >> >>> in the end of the email). >> >>> But the question is: what is causing this pause??!! >> >>> >> >>> You swapping? >> >>>> >> >>> I don't think so (stats below). >> >>> >> >>> Hardware is good? >> >>>> >> >>> Yes, it is a 16 processor machine with 74GB of RAM and plenty disk >> space. >> >>> Below are some metrics I have heard about. Hope it helps. >> >>> >> >>> >> >>> ** I am having some problems with the datanodes[1] which are having >> >>> trouble to >> >>> write. I really think the issues are related, but cannot solve any of >> >>> them >> >>> :( >> >>> >> >>> Thanks again, >> >>> Pablo >> >>> >> >>> [1] http://mail-archives.apache. >> ****org/mod_mbox/hadoop-hdfs-user/**** >> >>> 201303.mbox/%3CCAJzooYfS-F1KS+******jGOPUt15PwFjcCSzigE0APeM9FXaCr**** >> >>> [EMAIL PROTECTED]%3E<http:**//mail-archives.apache.org/** >> >>> mod_mbox/hadoop-hdfs-user/**201303.mbox/%3CCAJzooYfS-F1KS+** >> >>> jGOPUt15PwFjcCSzigE0APeM9FXaCr**[EMAIL PROTECTED]%3E< >> http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201303.mbox/%3CCAJzooYfS-F1KS+[EMAIL PROTECTED]%3E >> > >> >>> > >> >>> >> >>> top - 15:38:04 up 297 days, 21:03, 2 users, load average: 4.34, >> 2.55, >> >>> 1.28 >> >>> Tasks: 528 total, 1 running, 527 sleeping, 0 stopped, 0 zombie >> >>> Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, >> >>> 0.0%st >> >>> Mem: 74187256k total, 29493992k used, 44693264k free, 5836576k >> buffers >> >>> Swap: 51609592k total, 128312k used, 51481280k free, 1353400k >> cached >> >>> >> >>> ]$ vmstat -w >> >
-
Re: RegionServers Crashing every hour in production envAndrew Purtell 2013-03-11, 02:24
Be careful with GC tuning, throwing changes at an application without
analysis of what is going on with the heap is shooting in the dark. One particular good treatment of the subject is here: http://java.dzone.com/articles/how-tame-java-gc-pauses If you have made custom changes to blockcache or memstore configurations, back them out until you're sure everything else is ok. Watch carefully for swapping. Set the vm.swappiness sysctl to 0. Monitor for spikes in page scanning or any swap activity. Nothing brings on "Juliette" pauses better than a JVM partially swapped out. The Java GC starts collection by examining the oldest pages, and those are the first pages the OS swaps out... On Mon, Mar 11, 2013 at 10:13 AM, Azuryy Yu <[EMAIL PROTECTED]> wrote: > Hi Pablo, > It'a terrible for a long minor GC. I don't think there are swaping from > your vmstat log. > but I just suggest you > 1) add following JVM options: > -XX:+DisableExplicitGC -XX:+UseCompressedOops -XX:GCTimeRatio=19 > -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=2 > -XX:MaxTenuringThreshold=3 -XX:+UseFastAccessorMethods > > 2) -Xmn is two small, your total Mem is 74GB, just make -Xmn2g > 3) what are you doing during long GC happened? read or write? if reading, > what the block cache size? > > > > > On Mon, Mar 11, 2013 at 6:41 AM, Stack <[EMAIL PROTECTED]> wrote: > > > You could increase your zookeeper session timeout to 5 minutes while you > > are figuring why these long pauses. > > http://hbase.apache.org/book.html#zookeeper.session.timeout > > > > Above, there is an outage for almost 5 minutes: > > > > >> We slept 225100ms instead of 3000ms, this is likely due to a long > > > > You have ganglia or tsdb running? When you see the big pause above, can > > you see anything going on on the machine? (swap, iowait, concurrent fat > > mapreduce job?) > > > > St.Ack > > > > > > > > On Sun, Mar 10, 2013 at 3:29 PM, Pablo Musa <[EMAIL PROTECTED]> wrote: > > > > > Hi Sreepathi, > > > they say in the book (or the site), we could try it to see if it is > > really > > > a timeout error > > > or there is something more. But it is not recomended for production > > > environments. > > > > > > I could give it a try if five minutes will ensure to us that the > problem > > > is the GC or > > > elsewhere!! Anyway, I think it is hard to beleive a GC is taking 2:30 > > > minutes. > > > > > > Abs, > > > Pablo > > > > > > > > > On 03/10/2013 04:06 PM, Sreepathi wrote: > > > > > >> Hi Stack/Ted/Pablo, > > >> > > >> Should we increase the hbase.rpc.timeout property to 5 minutes ( > 300000 > > ms > > >> ) ? > > >> > > >> Regards, > > >> - Sreepathi > > >> > > >> On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa <[EMAIL PROTECTED]> wrote: > > >> > > >> That combo should be fine. > > >>>> > > >>> Great!! > > >>> > > >>> > > >>> If JVM is full GC'ing, the application is stopped. > > >>>> The below does not look like a full GC but that is a long pause in > > >>>> system > > >>>> time, enough to kill your zk session. > > >>>> > > >>> Exactly. This pause is really making the zk expire the RS which > > shutsdown > > >>> (logs > > >>> in the end of the email). > > >>> But the question is: what is causing this pause??!! > > >>> > > >>> You swapping? > > >>>> > > >>> I don't think so (stats below). > > >>> > > >>> Hardware is good? > > >>>> > > >>> Yes, it is a 16 processor machine with 74GB of RAM and plenty disk > > space. > > >>> Below are some metrics I have heard about. Hope it helps. > > >>> > > >>> > > >>> ** I am having some problems with the datanodes[1] which are having > > >>> trouble to > > >>> write. I really think the issues are related, but cannot solve any of > > >>> them > > >>> :( > > >>> > > >>> Thanks again, > > >>> Pablo > > >>> > > >>> [1] http://mail-archives.apache. > ****org/mod_mbox/hadoop-hdfs-user/**** > > >>> > 201303.mbox/%3CCAJzooYfS-F1KS+******jGOPUt15PwFjcCSzigE0APeM9FXaCr**** > > >>> [EMAIL PROTECTED]%3E<http:**//mail-archives.apache.org/** > > >>> mod_mbox/hadoop-hdfs-user/**201303.mbox/%3CCAJzooYfS-F1KS+** Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: RegionServers Crashing every hour in production envPablo Musa 2013-03-12, 15:43
Guys,
thank you very much for the help. Yesterday I spent 14 hours trying to tune the whole cluster. The cluster is not ready yet needs a lot of tunning, but at least is working. My first big problem was namenode + datanode GC. They were not using CMS and thus were taking "incremental" time to run. Ii started in 0.01 ms and in 20 minutes was taking 150 secs. After setting CMSGC this time is much smaller taking a maximum of 70 secs, which is VERY HIGH, but for now does not stop HBase. With this issue solved, it was clear that the RS was doing a long pause GC, taking up to 220 secs. Zookeeper expired the RS and it shutdown. I tried a lot of different flags configuration (MORE than 20), and could not get small gcs. Eventually it would take more than 150 secs (zookeeper timeout) and shutdown. Finally I tried a config that so far, 12 hours, is working with a maximum GC time of 90 secs. Which of course is a terrible problem since HBase is a database, but at least the cluster is stable while I can tune it a little more. In my opinion, my biggest problem is to have a few "monster" machines in the cluster instead of a bunch of commodities machines. I don't know if there are a lot companies using this kind of machines inside a hadoop cluster, but a fast search on google could not find a lot of tunes for big heap GCs. I guess my next step will be search for big heap gc tuning. Back to some questions ;) > You have ganglia or tsdb running? I use zabbix for now, and no there is nothing going on when the big pause happens. > When you see the big pause above, can you see anything going on on the > machine? (swap, iowait, concurrent fat mapreduce job?) > what are you doing during long GC happened? read or write? if reading, what > the block cache size? The cpu for the RS process goes to 100% and the logs "pause", until it gets out. Ex: [NewPar IO and SWAP are normal. There is no MR running, just normal database load, which is very low. I am probably doing reads AND writes to the database with default block cache size. One problem in this moment might be the big number of regions (1252) since I am using only one RS to be able to track the problem. The links and ideas were very helpful. Thank you very much guys. I will post my future researches as I find a solution ;) If you have more ideas or info (links, flag suggestions, etc.), please post it :) Abs, Pablo On 03/10/2013 11:24 PM, Andrew Purtell wrote: > Be careful with GC tuning, throwing changes at an application without > analysis of what is going on with the heap is shooting in the dark. One > particular good treatment of the subject is here: > http://java.dzone.com/articles/how-tame-java-gc-pauses > > If you have made custom changes to blockcache or memstore configurations, > back them out until you're sure everything else is ok. > > Watch carefully for swapping. Set the vm.swappiness sysctl to 0. Monitor > for spikes in page scanning or any swap activity. Nothing brings on > "Juliette" pauses better than a JVM partially swapped out. The Java GC > starts collection by examining the oldest pages, and those are the first > pages the OS swaps out... > > > > On Mon, Mar 11, 2013 at 10:13 AM, Azuryy Yu <[EMAIL PROTECTED]> wrote: > >> Hi Pablo, >> It'a terrible for a long minor GC. I don't think there are swaping from >> your vmstat log. >> but I just suggest you >> 1) add following JVM options: >> -XX:+DisableExplicitGC -XX:+UseCompressedOops -XX:GCTimeRatio=19 >> -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=2 >> -XX:MaxTenuringThreshold=3 -XX:+UseFastAccessorMethods >> >> 2) -Xmn is two small, your total Mem is 74GB, just make -Xmn2g >> 3) what are you doing during long GC happened? read or write? if reading, >> what the block cache size? >> >> >> >> >> On Mon, Mar 11, 2013 at 6:41 AM, Stack <[EMAIL PROTECTED]> wrote: >> >>> You could increase your zookeeper session timeout to 5 minutes while you >>> are figuring why these long pauses. >>> http://hbase.apache.org/book.html#zookeeper.session.timeout
-
Re: RegionServers Crashing every hour in production envPablo Musa 2013-04-03, 18:21
Hello guys,
I stopped my research on HBase ZK timeout for while due to other issues I had to do, but I am back. A very weird behavior that I would like your comments is that my RegionServers perform better (less crashes) under heavy load instead of light load. There is, if I let HBase alone with 50 requestsPerSecond along the whole day the crashes are higher than if I run a mapred Job every hour. Another weird thing is the following: RS startTime = Mon Apr 01 13:22:35 BRT 2013 [...]$ grep slept /var/log/hbase/hbase-hbase-regionserver-PSLBHDN00*.log 2013-04-01 20:09:21,135 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 45491ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired 2013-04-01 22:45:59,407 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 101271ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired [...]$ egrep 'real=[1-9][0-9][0-9][0-9]*\.[0-9][0-9]' /var/log/hbase/hbase-hbase-regionserver-PSLBHDN00*.out * the below report is the above command for each time range. 0.0 - 0.1 secs GCs = 5084 0.1 - 0.5 secs GCs = 9 0.5 - 1.0 secs GCs = 3 1.0 - 010 secs GCs = 0 010 - 100 secs GCs = 0 100 - 1000 secs GCs = 0 So, my script for getting every gc time ("real=... secs") says that there is no gc that took longer than 1 second. However the RS log says twice that the RS slept more than 40 seconds instead of 3. "this is likely due to a long garbage collecting pause", yes this is likely but I dont think it is the case. The machine is a huge machine with 70GB RAM, 32 procs, light load, no swap or iowait. Any ideas? Thanks, Pablo On 03/12/2013 12:43 PM, Pablo Musa wrote: > Guys, > thank you very much for the help. > > Yesterday I spent 14 hours trying to tune the whole cluster. > The cluster is not ready yet needs a lot of tunning, but at least is > working. > > My first big problem was namenode + datanode GC. They were not using > CMS and thus were taking "incremental" time to run. Ii started in 0.01 > ms and > in 20 minutes was taking 150 secs. > After setting CMSGC this time is much smaller taking a maximum of 70 secs, > which is VERY HIGH, but for now does not stop HBase. > > With this issue solved, it was clear that the RS was doing a long pause GC, > taking up to 220 secs. Zookeeper expired the RS and it shutdown. > I tried a lot of different flags configuration (MORE than 20), and could not > get small gcs. Eventually it would take more than 150 secs (zookeeper > timeout) > and shutdown. > > Finally I tried a config that so far, 12 hours, is working with a maximum GC > time of 90 secs. Which of course is a terrible problem since HBase is a > database, but at least the cluster is stable while I can tune it a > little more. > > In my opinion, my biggest problem is to have a few "monster" machines in the > cluster instead of a bunch of commodities machines. I don't know if > there are > a lot companies using this kind of machines inside a hadoop cluster, but > a fast search on google could not find a lot of tunes for big heap GCs. > > I guess my next step will be search for big heap gc tuning. > > Back to some questions ;) > > > You have ganglia or tsdb running? > > I use zabbix for now, and no there is nothing going on when the big > pause happens. > > > When you see the big pause above, can you see anything going on on the > > machine? (swap, iowait, concurrent fat mapreduce job?) > > what are you doing during long GC happened? read or write? if > reading, what > > the block cache size? > > The cpu for the RS process goes to 100% and the logs "pause", until it > gets out. > Ex: [NewPar > > IO and SWAP are normal. There is no MR running, just normal database > load, which is > very low. I am probably doing reads AND writes to the database with > default block > cache size.
-
Re: RegionServers Crashing every hour in production envTed Yu 2013-04-03, 18:36
Thanks for the sharing.
Have you looked at http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired , suggested below ? I think zookeeper timeout should be more closely watched. On Wed, Apr 3, 2013 at 11:21 AM, Pablo Musa <[EMAIL PROTECTED]> wrote: > Hello guys, > I stopped my research on HBase ZK timeout for while due to > other issues I had to do, but I am back. > > A very weird behavior that I would like your comments is that my > RegionServers perform better (less crashes) under heavy load instead > of light load. > There is, if I let HBase alone with 50 requestsPerSecond along the > whole day the crashes are higher than if I run a mapred Job every hour. > > > Another weird thing is the following: > > RS startTime = Mon Apr 01 13:22:35 BRT 2013 > > [...]$ grep slept /var/log/hbase/hbase-hbase-**regionserver-PSLBHDN00*.log > 2013-04-01 20:09:21,135 WARN org.apache.hadoop.hbase.util.**Sleeper: We > slept 45491ms instead of 3000ms, this is likely due to a long garbage > collecting pause and it's usually bad, see http://hbase.apache.org/book.** > html#trouble.rs.runtime.**zkexpired<http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired> > 2013-04-01 22:45:59,407 WARN org.apache.hadoop.hbase.util.**Sleeper: We > slept 101271ms instead of 3000ms, this is likely due to a long garbage > collecting pause and it's usually bad, see http://hbase.apache.org/book.** > html#trouble.rs.runtime.**zkexpired<http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired> > > [...]$ egrep 'real=[1-9][0-9][0-9][0-9]*\.[**0-9][0-9]' > /var/log/hbase/hbase-hbase-**regionserver-PSLBHDN00*.out > * the below report is the above command for each time range. > 0.0 - 0.1 secs GCs = 5084 > 0.1 - 0.5 secs GCs = 9 > 0.5 - 1.0 secs GCs = 3 > 1.0 - 010 secs GCs = 0 > 010 - 100 secs GCs = 0 > 100 - 1000 secs GCs = 0 > > So, my script for getting every gc time ("real=... secs") says that > there is no gc that took longer than 1 second. > However the RS log says twice that the RS slept more than 40 seconds > instead of 3. > > "this is likely due to a long garbage collecting pause", yes > this is likely but I dont think it is the case. > > The machine is a huge machine with 70GB RAM, 32 procs, light load, > no swap or iowait. > > Any ideas? > > Thanks, > Pablo > > > On 03/12/2013 12:43 PM, Pablo Musa wrote: > >> Guys, >> thank you very much for the help. >> >> Yesterday I spent 14 hours trying to tune the whole cluster. >> The cluster is not ready yet needs a lot of tunning, but at least is >> working. >> >> My first big problem was namenode + datanode GC. They were not using >> CMS and thus were taking "incremental" time to run. Ii started in 0.01 >> ms and >> in 20 minutes was taking 150 secs. >> After setting CMSGC this time is much smaller taking a maximum of 70 secs, >> which is VERY HIGH, but for now does not stop HBase. >> >> With this issue solved, it was clear that the RS was doing a long pause >> GC, >> taking up to 220 secs. Zookeeper expired the RS and it shutdown. >> I tried a lot of different flags configuration (MORE than 20), and could >> not >> get small gcs. Eventually it would take more than 150 secs (zookeeper >> timeout) >> and shutdown. >> >> Finally I tried a config that so far, 12 hours, is working with a maximum >> GC >> time of 90 secs. Which of course is a terrible problem since HBase is a >> database, but at least the cluster is stable while I can tune it a >> little more. >> >> In my opinion, my biggest problem is to have a few "monster" machines in >> the >> cluster instead of a bunch of commodities machines. I don't know if >> there are >> a lot companies using this kind of machines inside a hadoop cluster, but >> a fast search on google could not find a lot of tunes for big heap GCs. >> >> I guess my next step will be search for big heap gc tuning. >> >> Back to some questions ;) >> >> > You have ganglia or tsdb running? >> >> I use zabbix for now, and no there is nothing going on when the big >> pause happens.
-
Re: RegionServers Crashing every hour in production envPablo Musa 2013-04-03, 20:24
> Have you looked at http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired , suggested below ?
Yes I did, but GC is not the issue here. > I think zookeeper timeout should be more closely watched. What do you mean? My ZK timeout today is 150 secs, which is very big. However, my problem is finding where this timeout come from?! Any ideas? Thanks, Pablo On 04/03/2013 03:36 PM, Ted Yu wrote: > Thanks for the sharing. > > Have you looked at > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired , suggested > below ? > > I think zookeeper timeout should be more closely watched. > > > On Wed, Apr 3, 2013 at 11:21 AM, Pablo Musa <[EMAIL PROTECTED]> wrote: > >> Hello guys, >> I stopped my research on HBase ZK timeout for while due to >> other issues I had to do, but I am back. >> >> A very weird behavior that I would like your comments is that my >> RegionServers perform better (less crashes) under heavy load instead >> of light load. >> There is, if I let HBase alone with 50 requestsPerSecond along the >> whole day the crashes are higher than if I run a mapred Job every hour. >> >> >> Another weird thing is the following: >> >> RS startTime = Mon Apr 01 13:22:35 BRT 2013 >> >> [...]$ grep slept /var/log/hbase/hbase-hbase-**regionserver-PSLBHDN00*.log >> 2013-04-01 20:09:21,135 WARN org.apache.hadoop.hbase.util.**Sleeper: We >> slept 45491ms instead of 3000ms, this is likely due to a long garbage >> collecting pause and it's usually bad, see http://hbase.apache.org/book.** >> html#trouble.rs.runtime.**zkexpired<http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired> >> 2013-04-01 22:45:59,407 WARN org.apache.hadoop.hbase.util.**Sleeper: We >> slept 101271ms instead of 3000ms, this is likely due to a long garbage >> collecting pause and it's usually bad, see http://hbase.apache.org/book.** >> html#trouble.rs.runtime.**zkexpired<http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired> >> >> [...]$ egrep 'real=[1-9][0-9][0-9][0-9]*\.[**0-9][0-9]' >> /var/log/hbase/hbase-hbase-**regionserver-PSLBHDN00*.out >> * the below report is the above command for each time range. >> 0.0 - 0.1 secs GCs = 5084 >> 0.1 - 0.5 secs GCs = 9 >> 0.5 - 1.0 secs GCs = 3 >> 1.0 - 010 secs GCs = 0 >> 010 - 100 secs GCs = 0 >> 100 - 1000 secs GCs = 0 >> >> So, my script for getting every gc time ("real=... secs") says that >> there is no gc that took longer than 1 second. >> However the RS log says twice that the RS slept more than 40 seconds >> instead of 3. >> >> "this is likely due to a long garbage collecting pause", yes >> this is likely but I dont think it is the case. >> >> The machine is a huge machine with 70GB RAM, 32 procs, light load, >> no swap or iowait. >> >> Any ideas? >> >> Thanks, >> Pablo >> >> >> On 03/12/2013 12:43 PM, Pablo Musa wrote: >> >>> Guys, >>> thank you very much for the help. >>> >>> Yesterday I spent 14 hours trying to tune the whole cluster. >>> The cluster is not ready yet needs a lot of tunning, but at least is >>> working. >>> >>> My first big problem was namenode + datanode GC. They were not using >>> CMS and thus were taking "incremental" time to run. Ii started in 0.01 >>> ms and >>> in 20 minutes was taking 150 secs. >>> After setting CMSGC this time is much smaller taking a maximum of 70 secs, >>> which is VERY HIGH, but for now does not stop HBase. >>> >>> With this issue solved, it was clear that the RS was doing a long pause >>> GC, >>> taking up to 220 secs. Zookeeper expired the RS and it shutdown. >>> I tried a lot of different flags configuration (MORE than 20), and could >>> not >>> get small gcs. Eventually it would take more than 150 secs (zookeeper >>> timeout) >>> and shutdown. >>> >>> Finally I tried a config that so far, 12 hours, is working with a maximum >>> GC >>> time of 90 secs. Which of course is a terrible problem since HBase is a >>> database, but at least the cluster is stable while I can tune it a >>> little more. >>> >>> In my opinion, my biggest problem is to have a few "monster" machines in
-
Re: RegionServers Crashing every hour in production envTed Yu 2013-04-03, 21:40
I went over related emails in my Inbox.
One previous case was that other task was running on the region server node which consumed good portion of CPU. In that case I observed a gap of activities in region server log. I can send that snippet, after anonymization since there were some IP addresses at my previous employer. FYI On Wed, Apr 3, 2013 at 1:24 PM, Pablo Musa <[EMAIL PROTECTED]> wrote: > Have you looked at http://hbase.apache.org/book.**html#trouble.rs.runtime. >> **zkexpired<http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired>, suggested below ? >> > > Yes I did, but GC is not the issue here. > > > I think zookeeper timeout should be more closely watched. >> > > What do you mean? > My ZK timeout today is 150 secs, which is very big. However, my > problem is finding where this timeout come from?! Any ideas? > > Thanks, > Pablo > > > On 04/03/2013 03:36 PM, Ted Yu wrote: > >> Thanks for the sharing. >> >> Have you looked at >> http://hbase.apache.org/book.**html#trouble.rs.runtime.**zkexpired<http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired>, suggested >> below ? >> >> I think zookeeper timeout should be more closely watched. >> >> >> On Wed, Apr 3, 2013 at 11:21 AM, Pablo Musa <[EMAIL PROTECTED]> wrote: >> >> Hello guys, >>> I stopped my research on HBase ZK timeout for while due to >>> other issues I had to do, but I am back. >>> >>> A very weird behavior that I would like your comments is that my >>> RegionServers perform better (less crashes) under heavy load instead >>> of light load. >>> There is, if I let HBase alone with 50 requestsPerSecond along the >>> whole day the crashes are higher than if I run a mapred Job every hour. >>> >>> >>> Another weird thing is the following: >>> >>> RS startTime = Mon Apr 01 13:22:35 BRT 2013 >>> >>> [...]$ grep slept /var/log/hbase/hbase-hbase-**** >>> regionserver-PSLBHDN00*.log >>> 2013-04-01 20:09:21,135 WARN org.apache.hadoop.hbase.util.****Sleeper: >>> We >>> >>> slept 45491ms instead of 3000ms, this is likely due to a long garbage >>> collecting pause and it's usually bad, see >>> http://hbase.apache.org/book.**** <http://hbase.apache.org/book.**> >>> html#trouble.rs.runtime.****zkexpired<http://hbase.apache.** >>> org/book.html#trouble.rs.**runtime.zkexpired<http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired> >>> > >>> 2013-04-01 22:45:59,407 WARN org.apache.hadoop.hbase.util.****Sleeper: >>> We >>> >>> slept 101271ms instead of 3000ms, this is likely due to a long garbage >>> collecting pause and it's usually bad, see >>> http://hbase.apache.org/book.**** <http://hbase.apache.org/book.**> >>> html#trouble.rs.runtime.****zkexpired<http://hbase.apache.** >>> org/book.html#trouble.rs.**runtime.zkexpired<http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired> >>> > >>> >>> [...]$ egrep 'real=[1-9][0-9][0-9][0-9]*\.[****0-9][0-9]' >>> /var/log/hbase/hbase-hbase-****regionserver-PSLBHDN00*.out >>> >>> * the below report is the above command for each time range. >>> 0.0 - 0.1 secs GCs = 5084 >>> 0.1 - 0.5 secs GCs = 9 >>> 0.5 - 1.0 secs GCs = 3 >>> 1.0 - 010 secs GCs = 0 >>> 010 - 100 secs GCs = 0 >>> 100 - 1000 secs GCs = 0 >>> >>> So, my script for getting every gc time ("real=... secs") says that >>> there is no gc that took longer than 1 second. >>> However the RS log says twice that the RS slept more than 40 seconds >>> instead of 3. >>> >>> "this is likely due to a long garbage collecting pause", yes >>> this is likely but I dont think it is the case. >>> >>> The machine is a huge machine with 70GB RAM, 32 procs, light load, >>> no swap or iowait. >>> >>> Any ideas? >>> >>> Thanks, >>> Pablo >>> >>> >>> On 03/12/2013 12:43 PM, Pablo Musa wrote: >>> >>> Guys, >>>> thank you very much for the help. >>>> >>>> Yesterday I spent 14 hours trying to tune the whole cluster. >>>> The cluster is not ready yet needs a lot of tunning, but at least is >>>> working. >>>> >>>> My first big problem was namenode + datanode GC. They were not using |