Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> hbase region server shutdown after datanode connection exception


Copy link to this message
-
hbase region server shutdown after datanode connection exception
Hi all.

 

         I have a small hbase cluster with 3 physical machines.

         On 192.168.1.80, there are HMaster and a region server. On 81 & 82,
there is a region server on each.

         The region server on 80 can't sync HLog after a datanode access
exception, and started to shutdown.

         The datanode itself was not shutdown and response other requests
normally. I'll paste logs below.

         My question is:

         1. Why this exception causes region server shutdown? Can I prevent
it?

         2. Is there any tools(shell command is best, like hadoop dfsadmin
-report) can monitor hbase region server? to check whether it is alive or
dead?

           I have done some research that nagios/ganglia can do such things.
      But actually I just want know the region server is alive or dead, so
they are a little over qualify.

           And I'm not using CDH, so I can't use Cloudera Manager I think.

 

         Here are the logs.

        

         HBase master:
2013-05-21 17:03:32,675 ERROR org.apache.hadoop.hbase.master.HMaster: Region
server ^@^@hadoop01,60020,1368774173179 reported a fatal error:

ABORTING region server hadoop01,60020,1368774173179:
regionserver:60020-0x3eb14c67540002 regionserver:60020-0x3eb14c67540002
received expired from ZooKeeper, aborting

Cause:

org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired

        at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeper
Watcher.java:369)

        at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.
java:266)

        at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521
)

        at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)

 

         Region Server:

2013-05-21 17:00:16,895 INFO org.apache.zookeeper.ClientCnxn: Client session
timed out, have not heard from server in 120000ms for sessionid
0x3eb14c67540002, closing socket connection and attempting re

connect

2013-05-21 17:00:35,896 INFO org.apache.zookeeper.ClientCnxn: Client session
timed out, have not heard from server in 120000ms for sessionid
0x13eb14ca4bb0000, closing socket connection and attempting r

econnect

2013-05-21 17:03:31,498 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception  for block
blk_9188414668950016309_4925046java.net.SocketTimeoutException: 63000 millis
timeout

 while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.80:57020
remote=/192.168.1.82:50010]

        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)

        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)

        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)

        at java.io.DataInputStream.readFully(DataInputStream.java:178)

        at java.io.DataInputStream.readLong(DataInputStream.java:399)

        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.
readFields(DataTransferProtocol.java:124)

        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSCl
ient.java:2784)

 

2013-05-21 17:03:31,520 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_9188414668950016309_4925046 bad datanode[0]
192.168.1.82:50010

2013-05-21 17:03:32,315 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server /192.168.1.82:2100

2013-05-21 17:03:32,316 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hadoop03/192.168.1.82:2100, initiating session

2013-05-21 17:03:32,317 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server hadoop03/192.168.1.82:2100, sessionid 0x13eb14ca4bb0000, negotiated timeout = 180000

2013-05-21 17:03:32,497 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog:
Could not sync. Requesting close of hlog

java.io.IOException: Reflection

        at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(Sequence
FileLogWriter.java:230)

        at
org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1091)

        at
org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1195)

        at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.
java:1057)

        at java.lang.Thread.run(Thread.java:662)

Caused by: java.lang.reflect.InvocationTargetException

        at sun.reflect.GeneratedMethodAccessor48.invoke(Unknown Source)

        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)

        at java.lang.reflect.Method.invoke(Method.java:597)

        at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(Sequence
FileLogWriter.java:228)

        ... 4 more

Caused by: java.io.IOException: DFSOutputStream is closed

        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3483)

        at
org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)

        at
org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944)

        ... 8 more

2013-05-21 17:03:32,497 DEBUG
org.apache.hadoop.hbase.regionserver.LogRoller: HLog roll requested

2013-05-21 17:03:32,498 DEBUG
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: new
createWriter -- HADOOP-6840 -- not available

2013-05-21 17:03:32,548 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server hadoop03/192.168.1.82:2100

2013-05-21 17:03:32,548 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hadoop03/192.168.1.82:2100, initiating session

2013-05-21 17:03:32,549 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x3eb14c67540002 has expired,
closing socket connection

2013-05-21 17:03:32,549 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
hadoop01,60020,1368774173179: regionserver:60020-0x3eb1
+
Jean-Daniel Cryans 2013-05-23, 16:52
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB