Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> help why do my regionservers shut themselves down?


Copy link to this message
-
Re: help why do my regionservers shut themselves down?
Hi Kaveh,

the respons is maybe already displayed on the logs you sent ;)

"This disconnect could have been caused by a network partition or a
long-running GC pause, either way it's recommended that you verify
your environment."

Do you have GC logs? Have you tried anything to solve that?

JM

2013/4/22 kaveh minooie <[EMAIL PROTECTED]>:
>
> Hi
>
> after a few mapreduce jobs my regionservers shut themselves down. this is
> the latest time that this has happened:
>
> 2013-04-22 16:47:21,843 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> This client just lost it's session with ZooKeeper, trying to reconnect.
> 2013-04-22 16:47:21,843 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> serverName=d1r1n17.prod.plutoz.com,60020,1366657358443, load=(requests=5
> 392, regions=196, usedHeap=1063, maxHeap=3966):
> regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661
> regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 received expired fr
> om ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
>         at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:352)
>         at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:270)
>         at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:523)
>         at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:499)
> 2013-04-22 16:47:21,843 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Trying to reconnect to zookeeper.
> 2013-04-22 16:47:21,844 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> requests=1794, regions=196, stores=1561, storefiles=1585,
> storefileIndexSize=104, memstoreSize=306, compactionQueueSize=10,
> flushQueueSize=0, usedHeap=1073, maxHeap=3966, blockCacheSize=661986032,
> blockCacheFree=169901776, blockCacheCount=7242, blockCacheHitCount=910925,
> blockCacheMissCount=1558134, blockCacheEvictedCount=1344753,
> blockCacheHitRatio=36, blockCacheHitCachingRatio=40
> 2013-04-22 16:47:21,844 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED:
> regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661
> regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 received expired from
> ZooKeeper, aborting
> 2013-04-22 16:47:21,844 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-04-22 16:47:21,900 WARN org.apache.hadoop.hbase.regionserver.wal.HLog:
> Too many consecutive RollWriter requests, it's a sign of the total number of
> live datanodes is lower than the tolerable replicas.
> 2013-04-22 16:47:22,341 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=zk1:2181 sessionTimeout=180000
> watcher=hconnection
> 2013-04-22 16:47:22,357 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 1 regions to
> close
> 2013-04-22 16:47:22,394 INFO org.apache.zookeeper.ClientCnxn: Opening socket
> connection to server d1r2n2.prod.plutoz.com/10.0.0.66:2181. Will not attempt
> to authenticate using SASL (unknown error)
> 2013-04-22 16:47:22,395 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to d1r2n2.prod.plutoz.com/10.0.0.66:2181, initiating
> session
> 2013-04-22 16:47:22,397 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server d1r2n2.prod.plutoz.com/10.0.0.66:2181,
> sessionid = 0x13dd980d2abbf93, negotiated timeout = 40000
> 2013-04-22 16:47:22,400 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Reconnected successfully. This disconnect could have been caused by a
> network partition or a long-running GC pause, either way it's recommended
> that you verify your environment.
> 2013-04-22 16:47:22,400 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-04-22 16:47:56,830 INFO org.apache.hadoop.hbase.regionserver.HRegion:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB