Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> hbase region server shutdown after datanode connection exception


Copy link to this message
-
Re: hbase region server shutdown after datanode connection exception
You are looking at it the wrong way. Per
http://hbase.apache.org/book.html#trouble.general, always walk up the
log to the first exception. In this case it's a session timeout.
Whatever happens next is most probably a side effect of that.

To help debug your issue, I would suggest reading this section of the
reference guide: http://hbase.apache.org/book.html#trouble.rs.runtime

J-D

On Tue, May 21, 2013 at 7:17 PM, Cheng Su <[EMAIL PROTECTED]> wrote:
> Hi all.
>
>
>
>          I have a small hbase cluster with 3 physical machines.
>
>          On 192.168.1.80, there are HMaster and a region server. On 81 & 82,
> there is a region server on each.
>
>          The region server on 80 can't sync HLog after a datanode access
> exception, and started to shutdown.
>
>          The datanode itself was not shutdown and response other requests
> normally. I'll paste logs below.
>
>          My question is:
>
>          1. Why this exception causes region server shutdown? Can I prevent
> it?
>
>          2. Is there any tools(shell command is best, like hadoop dfsadmin
> -report) can monitor hbase region server? to check whether it is alive or
> dead?
>
>            I have done some research that nagios/ganglia can do such things.
>
>
>       But actually I just want know the region server is alive or dead, so
> they are a little over qualify.
>
>            And I'm not using CDH, so I can't use Cloudera Manager I think.
>
>
>
>          Here are the logs.
>
>
>
>          HBase master:
> 2013-05-21 17:03:32,675 ERROR org.apache.hadoop.hbase.master.HMaster: Region
> server ^@^@hadoop01,60020,1368774173179 reported a fatal error:
>
> ABORTING region server hadoop01,60020,1368774173179:
> regionserver:60020-0x3eb14c67540002 regionserver:60020-0x3eb14c67540002
> received expired from ZooKeeper, aborting
>
> Cause:
>
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
>
>         at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeper
> Watcher.java:369)
>
>         at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.
> java:266)
>
>         at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521
> )
>
>         at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
>
>
>
>          Region Server:
>
> 2013-05-21 17:00:16,895 INFO org.apache.zookeeper.ClientCnxn: Client session
> timed out, have not heard from server in 120000ms for sessionid
> 0x3eb14c67540002, closing socket connection and attempting re
>
> connect
>
> 2013-05-21 17:00:35,896 INFO org.apache.zookeeper.ClientCnxn: Client session
> timed out, have not heard from server in 120000ms for sessionid
> 0x13eb14ca4bb0000, closing socket connection and attempting r
>
> econnect
>
> 2013-05-21 17:03:31,498 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> blk_9188414668950016309_4925046java.net.SocketTimeoutException: 63000 millis
> timeout
>
>  while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.80:57020
> remote=/192.168.1.82:50010]
>
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>
>         at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.
> readFields(DataTransferProtocol.java:124)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSCl
> ient.java:2784)
>
>
>
> 2013-05-21 17:03:31,520 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_9188414668950016309_4925046 bad datanode[0]
> 192.168.1.82:50010
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB