Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - help why do my regionservers shut themselves down?


+
kaveh minooie 2013-04-23, 01:25
+
Leonid Fedotov 2013-04-23, 15:59
Copy link to this message
-
Re: help why do my regionservers shut themselves down?
Jean-Marc Spaggiari 2013-04-23, 01:46
Hi Kaveh,

the respons is maybe already displayed on the logs you sent ;)

"This disconnect could have been caused by a network partition or a
long-running GC pause, either way it's recommended that you verify
your environment."

Do you have GC logs? Have you tried anything to solve that?

JM

2013/4/22 kaveh minooie <[EMAIL PROTECTED]>:
>
> Hi
>
> after a few mapreduce jobs my regionservers shut themselves down. this is
> the latest time that this has happened:
>
> 2013-04-22 16:47:21,843 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> This client just lost it's session with ZooKeeper, trying to reconnect.
> 2013-04-22 16:47:21,843 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> serverName=d1r1n17.prod.plutoz.com,60020,1366657358443, load=(requests=5
> 392, regions=196, usedHeap=1063, maxHeap=3966):
> regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661
> regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 received expired fr
> om ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
>         at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:352)
>         at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:270)
>         at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:523)
>         at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:499)
> 2013-04-22 16:47:21,843 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Trying to reconnect to zookeeper.
> 2013-04-22 16:47:21,844 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> requests=1794, regions=196, stores=1561, storefiles=1585,
> storefileIndexSize=104, memstoreSize=306, compactionQueueSize=10,
> flushQueueSize=0, usedHeap=1073, maxHeap=3966, blockCacheSize=661986032,
> blockCacheFree=169901776, blockCacheCount=7242, blockCacheHitCount=910925,
> blockCacheMissCount=1558134, blockCacheEvictedCount=1344753,
> blockCacheHitRatio=36, blockCacheHitCachingRatio=40
> 2013-04-22 16:47:21,844 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED:
> regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661
> regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 received expired from
> ZooKeeper, aborting
> 2013-04-22 16:47:21,844 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-04-22 16:47:21,900 WARN org.apache.hadoop.hbase.regionserver.wal.HLog:
> Too many consecutive RollWriter requests, it's a sign of the total number of
> live datanodes is lower than the tolerable replicas.
> 2013-04-22 16:47:22,341 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=zk1:2181 sessionTimeout=180000
> watcher=hconnection
> 2013-04-22 16:47:22,357 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 1 regions to
> close
> 2013-04-22 16:47:22,394 INFO org.apache.zookeeper.ClientCnxn: Opening socket
> connection to server d1r2n2.prod.plutoz.com/10.0.0.66:2181. Will not attempt
> to authenticate using SASL (unknown error)
> 2013-04-22 16:47:22,395 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to d1r2n2.prod.plutoz.com/10.0.0.66:2181, initiating
> session
> 2013-04-22 16:47:22,397 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server d1r2n2.prod.plutoz.com/10.0.0.66:2181,
> sessionid = 0x13dd980d2abbf93, negotiated timeout = 40000
> 2013-04-22 16:47:22,400 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Reconnected successfully. This disconnect could have been caused by a
> network partition or a long-running GC pause, either way it's recommended
> that you verify your environment.
> 2013-04-22 16:47:22,400 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-04-22 16:47:56,830 INFO org.apache.hadoop.hbase.regionserver.HRegion:
+
Ted Yu 2013-04-23, 02:35
+
kaveh minooie 2013-04-23, 04:47
+
Kevin Odell 2013-04-23, 10:15