Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - hbase region server shutdown after datanode connection exception


Copy link to this message
-
hbase region server shutdown after datanode connection exception
Cheng Su 2013-05-22, 02:17
Hi all.

 

         I have a small hbase cluster with 3 physical machines.

         On 192.168.1.80, there are HMaster and a region server. On 81 & 82,
there is a region server on each.

         The region server on 80 can't sync HLog after a datanode access
exception, and started to shutdown.

         The datanode itself was not shutdown and response other requests
normally. I'll paste logs below.

         My question is:

         1. Why this exception causes region server shutdown? Can I prevent
it?

         2. Is there any tools(shell command is best, like hadoop dfsadmin
-report) can monitor hbase region server? to check whether it is alive or
dead?

           I have done some research that nagios/ganglia can do such things.
      But actually I just want know the region server is alive or dead, so
they are a little over qualify.

           And I'm not using CDH, so I can't use Cloudera Manager I think.

 

         Here are the logs.

        

         HBase master:
2013-05-21 17:03:32,675 ERROR org.apache.hadoop.hbase.master.HMaster: Region
server ^@^@hadoop01,60020,1368774173179 reported a fatal error:

ABORTING region server hadoop01,60020,1368774173179:
regionserver:60020-0x3eb14c67540002 regionserver:60020-0x3eb14c67540002
received expired from ZooKeeper, aborting

Cause:

org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired

        at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeper
Watcher.java:369)

        at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.
java:266)

        at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521
)

        at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)

 

         Region Server:

2013-05-21 17:00:16,895 INFO org.apache.zookeeper.ClientCnxn: Client session
timed out, have not heard from server in 120000ms for sessionid
0x3eb14c67540002, closing socket connection and attempting re

connect

2013-05-21 17:00:35,896 INFO org.apache.zookeeper.ClientCnxn: Client session
timed out, have not heard from server in 120000ms for sessionid
0x13eb14ca4bb0000, closing socket connection and attempting r

econnect

2013-05-21 17:03:31,498 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception  for block
blk_9188414668950016309_4925046java.net.SocketTimeoutException: 63000 millis
timeout

 while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.80:57020
remote=/192.168.1.82:50010]

        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)

        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)

        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)

        at java.io.DataInputStream.readFully(DataInputStream.java:178)

        at java.io.DataInputStream.readLong(DataInputStream.java:399)

        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.
readFields(DataTransferProtocol.java:124)

        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSCl
ient.java:2784)

 

2013-05-21 17:03:31,520 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_9188414668950016309_4925046 bad datanode[0]
192.168.1.82:50010

2013-05-21 17:03:32,315 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server /192.168.1.82:2100

2013-05-21 17:03:32,316 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hadoop03/192.168.1.82:2100, initiating session

2013-05-21 17:03:32,317 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server hadoop03/192.168.1.82:2100, sessionid 0x13eb14ca4bb0000, negotiated timeout = 180000

2013-05-21 17:03:32,497 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog:
Could not sync. Requesting close of hlog

java.io.IOException: Reflection

        at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(Sequence
FileLogWriter.java:230)

        at
org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1091)

        at
org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1195)

        at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.
java:1057)

        at java.lang.Thread.run(Thread.java:662)

Caused by: java.lang.reflect.InvocationTargetException

        at sun.reflect.GeneratedMethodAccessor48.invoke(Unknown Source)

        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)

        at java.lang.reflect.Method.invoke(Method.java:597)

        at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(Sequence
FileLogWriter.java:228)

        ... 4 more

Caused by: java.io.IOException: DFSOutputStream is closed

        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3483)

        at
org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)

        at
org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944)

        ... 8 more

2013-05-21 17:03:32,497 DEBUG
org.apache.hadoop.hbase.regionserver.LogRoller: HLog roll requested

2013-05-21 17:03:32,498 DEBUG
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: new
createWriter -- HADOOP-6840 -- not available

2013-05-21 17:03:32,548 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server hadoop03/192.168.1.82:2100

2013-05-21 17:03:32,548 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hadoop03/192.168.1.82:2100, initiating session

2013-05-21 17:03:32,549 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x3eb14c67540002 has expired,
closing socket connection

2013-05-21 17:03:32,549 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
hadoop01,60020,1368774173179: regionserver:60020-0x3eb1