Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Occasional regionserver crashes following socket errors writing to HDFS


Copy link to this message
-
Re: Occasional regionserver crashes following socket errors writing to HDFS
http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A8
<property>
    <name>zookeeper.session.timeout</name>
    <value>1200000</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.tickTime</name>
    <value>6000</value>
  </property>
The default is 60 seconds which you reduced to 20.  (Assuming this is the right parameter)

As you said you were doing a major compaction at the time.
On May 24, 2012, at 6:15 AM, Eran Kutner wrote:

> Thanks Stack for noticing the ZooKeeper timeout, don't know how could I
> have missed that.
>
> After analyzing this for a while it is definitely unrelated to GC. In fact
> during the last 4 days no GC operation took more than 2 seconds, and those
> that got close were all concurrent mark sweeps, so they should not be
> stopping other threads.
>
> These are the interesting log lines:
> 2012-05-22 01:25:11,502 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 23706ms for sessionid
> 0x1372aa57bee0308, closing socket connection and attempting reconnect
> 2012-05-22 01:25:11,502 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 24638ms for sessionid
> 0x3372bf3891304bf, closing socket connection and attempting reconnect
> 2012-05-22 01:25:12,047 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hadoop1-zk1/10.1.104.201:2181
> 2012-05-22 01:25:12,048 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hadoop1-zk1/10.1.104.201:2181, initiating session
> 2012-05-22 01:25:12,080 INFO org.apache.zookeeper.ClientCnxn: Unable to
> reconnect to ZooKeeper service, session 0x3372bf3891304bf has expired,
> closing socket connection
> 2012-05-22 01:25:12,081 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> serverName=hadoop1-s05.farm-ny.gigya.com,60020,1336990798475,
> load=(requests=4015, regions=708, usedHeap=2342, maxHeap=7983):
> regionserver:60020-0x3372bf3891304bf regionserver:60020-0x3372bf3891304bf
> received expired from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
>        at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:352)
>        at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:270)
>        at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531)
>        at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
>
> This is what the zookeeper logs show at the same time:
> 2012-05-22 01:24:46,014 - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634] - EndOfStreamException: Unable to
> read additional data from client sessionid 0x1372aa57bef6611, likely client
> has closed socket
> 2012-05-22 01:24:46,014 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for
> client /10.1.104.4:57598 which had sessionid 0x1372aa57bef6611
> 2012-05-22 01:25:08,010 - ERROR [CommitProcessor:1:NIOServerCnxn@445] -
> Unexpected Exception:
> 2012-05-22 01:25:08,016 - INFO  [CommitProcessor:1:NIOServerCnxn@1435] -
> Closed socket connection for client /10.1.104.5:33945 which had sessionid
> 0x1372aa57bee0308
> 2012-05-22 01:25:12,046 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket
> connection from /10.1.104.5:43070
> 2012-05-22 01:25:12,076 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@770] - Client attempting to renew
> session 0x3372bf3891304bf at /10.1.104.5:43070
> 2012-05-22 01:25:12,076 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:Learner@103] - Revalidating client: 231702230809642175
> 2012-05-22 01:25:12,077 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1573] - Invalid session
> 0x3372bf3891304bf for client /10.1.104.5:43070, probably expired
> 2012-05-22 01:25:12,078 - INFO  [NIOServerCxn.Factory:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB