Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - help why do my regionservers shut themselves down?


+
kaveh minooie 2013-04-23, 01:25
Copy link to this message
-
Re: help why do my regionservers shut themselves down?
Leonid Fedotov 2013-04-23, 15:59
This could be a reason as well:
2013-04-22 16:47:21,900 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Too many consecutive RollWriter requests, it's a sign of the total number of live datanodes is lower than the tolerable replicas.
Make sure your cluster is in good health conditions...
Thank you!

Sincerely,
Leonid Fedotov
On Apr 22, 2013, at 6:25 PM, kaveh minooie wrote:

>
> Hi
>
> after a few mapreduce jobs my regionservers shut themselves down. this is the latest time that this has happened:
>
> 2013-04-22 16:47:21,843 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, trying to reconnect.
> 2013-04-22 16:47:21,843 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server serverName=d1r1n17.prod.plutoz.com,60020,1366657358443, load=(requests=5
> 392, regions=196, usedHeap=1063, maxHeap=3966): regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 received expired fr
> om ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:352)
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:270)
>        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:523)
>        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:499)
> 2013-04-22 16:47:21,843 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Trying to reconnect to zookeeper.
> 2013-04-22 16:47:21,844 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: requests=1794, regions=196, stores=1561, storefiles=1585, storefileIndexSize=104, memstoreSize=306, compactionQueueSize=10, flushQueueSize=0, usedHeap=1073, maxHeap=3966, blockCacheSize=661986032, blockCacheFree=169901776, blockCacheCount=7242, blockCacheHitCount=910925, blockCacheMissCount=1558134, blockCacheEvictedCount=1344753, blockCacheHitRatio=36, blockCacheHitCachingRatio=40
> 2013-04-22 16:47:21,844 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 received expired from ZooKeeper, aborting
> 2013-04-22 16:47:21,844 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
> 2013-04-22 16:47:21,900 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Too many consecutive RollWriter requests, it's a sign of the total number of live datanodes is lower than the tolerable replicas.
> 2013-04-22 16:47:22,341 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=zk1:2181 sessionTimeout=180000 watcher=hconnection
> 2013-04-22 16:47:22,357 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 1 regions to close
> 2013-04-22 16:47:22,394 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server d1r2n2.prod.plutoz.com/10.0.0.66:2181. Will not attempt to authenticate using SASL (unknown error)
> 2013-04-22 16:47:22,395 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to d1r2n2.prod.plutoz.com/10.0.0.66:2181, initiating session
> 2013-04-22 16:47:22,397 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server d1r2n2.prod.plutoz.com/10.0.0.66:2181, sessionid = 0x13dd980d2abbf93, negotiated timeout = 40000
> 2013-04-22 16:47:22,400 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Reconnected successfully. This disconnect could have been caused by a network partition or a long-running GC pause, either way it's recommended that you verify your environment.
> 2013-04-22 16:47:22,400 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
> 2013-04-22 16:47:56,830 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction interrupted by user:
+
Jean-Marc Spaggiari 2013-04-23, 01:46
+
Ted Yu 2013-04-23, 02:35
+
kaveh minooie 2013-04-23, 04:47
+
Kevin Odell 2013-04-23, 10:15