Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBase issues since upgrade from 0.92.4 to 0.94.6


Copy link to this message
-
HBase issues since upgrade from 0.92.4 to 0.94.6
David Koch 2013-07-12, 10:09
Hello,

NOTE: I posted the same message in the the Cloudera group.

Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase 0.94.6) we
systematically experience problems with region servers crashing silently
under workloads which used to pass without problems. More specifically, we
run about 30 Mapper jobs in parallel which read from HDFS and insert in
HBase.

region server log
NOTE: no trace of crash, but server is down and shows up as such in
Cloudera Manager.

2013-07-12 10:22:12,050 WARN
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
might be still open, length is 0
2013-07-12 10:22:12,051 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
Recovering file
hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX
t%2C60020%2C1373616547696.1373617004286
2013-07-12 10:22:13,064 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
Finished lease recover attempt for
hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
2013-07-12 10:22:14,819 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.deflate]
2013-07-12 10:22:14,824 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.deflate]
...
2013-07-12 10:22:14,850 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.deflate]
2013-07-12 10:22:15,530 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.deflate]
< -- last log entry, region server is down here -- >
datanode log, same machine

2013-07-12 10:22:04,811 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: XXXXXXX:50010:DataXceiver
error processing WRITE_BLOCK operation  src: /YYY.YY.YYY.YY:36024 dest:
/XXX.XX.XXX.XX:50010
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:724)
< -- many repetitions of this -- >

What could have caused this difference in stability?

We did not change any configuration settings with respect to the previous
CDH 4.0.1 setup. In particular, we left ulimit and
dfs.datanode.max.xcievers at 32k. If need be, I can provide more complete
log/configuration information.

Thank you,

/David