Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> RegionServer crashes silently under heavy RowKey scans


Copy link to this message
-
RegionServer crashes silently under heavy RowKey scans
Hi,

We are testing HBase for some read-heavy batch operations, and encountering frequent, silent RegionServer crashes.  The application does many thousands of very selective row scans on a dataset containing several hundred million rows (less than 200GB overall), via thrift.

We have encountered this behavior both on a physical cluster with CDH4 installed, and in an Amazon EMR HBase cluster.  HBase, Thrift, HDFS etc are all running with very minimal customization to the default configurations (in EMR's case, effectively no customization is done at all).

The logs contain some clues, but nothing straightforwardly acknowledging the root cause of failure.  As such it is somewhat scary and we'd really appreciate any advice or suggestions on cause, prevention, etc.
The post-mortem pattern goes like this:
1) RegionServer logs a WARN to the effect of:

 org.apache.hadoop.ipc.HBaseServer
 (responseTooLarge): {"processingtimems":2997,"call":"next(-6754667650407290823, 1000), rpc version=1, client version=29, methodsFingerPrint=1891768260","client":"somehost:55253","starttimems":1359399938999,"queuetimems":0,"class":"HRegionServer","responsesize":231379184,"method":"next"}

2) RegionServer logs one or a few of these at DEBUG, in the second or two before it goes silent:

 org.apache.hadoop.hdfs.DFSClient
 
 Error making BlockReader. Closing stale Socket[addr=/thishost,port=50010,localport=51345]
 java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:407)
at org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:105)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:937)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:455)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:645)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:689)
at java.io.DataInputStream.read(DataInputStream.java:132)
at org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:454)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1062)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:266)
at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.seekToDataBlock(HFileBlockIndex.java:209)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:519)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:534)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:178)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:111)
at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:83)
at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1779)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:2915)
at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1453)
at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1445)
at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1421)
at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2079)
at sun.reflect.GeneratedMethodAccessor210.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345)
3) HDFS logs this at INFO:

 org.apache.hadoop.hdfs.server.datanode.DataNode
 Exception in receiveBlock for BP-1555962259-thisIP-1358284735183:blk_-5783891561030721738_125644
 java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:506)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
at java.lang.Thread.run(Thread.java:662)

4) HDFS logs this as ERROR:

 org.apache.hadoop.hdfs.server.datanode.DataNode

 thishost:50010:DataXceiver error processing WRITE_BLOCK operation  src: /thishost:50620 dest: /thishost:50010
 java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:506
+
Stack 2013-01-29, 01:48
+
Jim Abramson 2013-01-30, 20:04
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB