Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Region server crashes


Copy link to this message
-
Region server crashes
Lior Schachter 2012-03-25, 08:23
Hi all,
We use hbase 0.9.2. We recently started to experience region servers
crashed under heavy load (2-3 different servers crashes eah load).
Seems like missing block in HDFS causes a full GC and regions are being
closed.

Following logs sample from the region server (gc log, region server log)
and data node log.

gc.log:

82619.081: [Full GC 82619.081: [CMS: 7973415K->7973415K(8005248K),
8.7188750 secs] 8304143K->8304130K(8350272K), [CMS Perm :
20159K->20153K(33976K)] icms_dc=100 , 8.7189890 secs] [Times: user=8.71
sys=0.00, real=8.72 secs]
82627.801: [Full GC 82627.801: [CMS: 7973415K->7973414K(8005248K),
12.2467710 secs] 8305494K->8304129K(8350272K), [CMS Perm :
20153K->20153K(33976K)] icms_dc=100 , 12.2468820 secs] [Times: user=12.24
sys=0.00, real=12.25 secs]
82640.048: [Full GC 82640.048: [CMS: 7973414K->7973414K(8005248K),
8.3197090 secs] 8304129K->8304129K(8350272K), [CMS Perm :
20153K->20153K(33976K)] icms_dc=100 , 8.3198190 secs] [Times: user=8.32
sys=0.00, real=8.32 secs]
82648.369: [Full GC 82648.369: [CMS: 7973414K->7973414K(8005248K),
8.2264360 secs] 8304237K->8304130K(8350272K), [CMS Perm :
20153K->20153K(33976K)] icms_dc=100 , 8.2265410 secs] [Times: user=8.22
sys=0.01, real=8.22 secs]
82656.596: [Full GC 82656.596: [CMS: 7973414K->7973414K(8005248K),
8.4928260 secs] 8304130K->8304130K(8350272K), [CMS Perm :
20153K->20153K(33976K)] icms_dc=100 , 8.4929270 secs] [Times: user=8.48
sys=0.01, real=8.49 secs]
82665.089: [Full GC 82665.089: [CMS: 7973414K->7973414K(8005248K),
8.3110610 secs] 8304132K->8304130K(8350272K), [CMS Perm :
20153K->20153K(33976K)] icms_dc=100 , 8.3111690 secs] [Times: user=8.29
sys=0.02, real=8.31 secs]
82673.400: [Full GC 82673.400: [CMS: 7973414K->7973414K(8005248K),
8.5270500 secs] 8304130K->8304130K(8350272K), [CMS Perm :
20153K->20153K(33976K)] icms_dc=100 , 8.5271560 secs] [Times: user=8.52
sys=0.00, real=8.53 secs]
82681.929: [Full GC 82681.929: [CMS (concurrent mode failure):
7973414K->7973414K(8005248K), 12.0992320 secs]
8305391K->8304128K(8350272K), [CMS Perm : 20153K->20153K(33976K)]
icms_dc=100 , 12.0993380 secs] [Times: user=12.08 sys=0.03, real=12.10 secs]

region server log:

2012-03-25 07:25:28,729 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 8041 caught: java.nio.channels.ClosedChannelException
        at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
        at
org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1387)
        at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)

2012-03-25 07:31:36,931 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception  for block
blk_9110507610649672616_2064873java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readLong(DataInputStream.java:399)
        at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:122)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2548)

org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.ipc.RemoteException: java.io.IOException:
blk_-6219291421501721811_2065530 is already commited, storedBlock == null.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:4877)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:501)
        at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)

        at org.apache.hadoop.ipc.Client.call(Client.java:740)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy4.nextGenerationStamp(Unknown Source)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1577)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1551)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1617)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)

        at org.apache.hadoop.ipc.Client.call(Client.java:740)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy9.recoverBlock(Unknown Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2706)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFS
+
Jean-Daniel Cryans 2012-03-26, 17:43
+
Lior Schachter 2012-03-27, 09:54