|
|
-
RegionServer crashes silently under heavy RowKey scansJim Abramson 2013-01-28, 20:14
Hi,
We are testing HBase for some read-heavy batch operations, and encountering frequent, silent RegionServer crashes. The application does many thousands of very selective row scans on a dataset containing several hundred million rows (less than 200GB overall), via thrift. We have encountered this behavior both on a physical cluster with CDH4 installed, and in an Amazon EMR HBase cluster. HBase, Thrift, HDFS etc are all running with very minimal customization to the default configurations (in EMR's case, effectively no customization is done at all). The logs contain some clues, but nothing straightforwardly acknowledging the root cause of failure. As such it is somewhat scary and we'd really appreciate any advice or suggestions on cause, prevention, etc. The post-mortem pattern goes like this: 1) RegionServer logs a WARN to the effect of: org.apache.hadoop.ipc.HBaseServer (responseTooLarge): {"processingtimems":2997,"call":"next(-6754667650407290823, 1000), rpc version=1, client version=29, methodsFingerPrint=1891768260","client":"somehost:55253","starttimems":1359399938999,"queuetimems":0,"class":"HRegionServer","responsesize":231379184,"method":"next"} 2) RegionServer logs one or a few of these at DEBUG, in the second or two before it goes silent: org.apache.hadoop.hdfs.DFSClient Error making BlockReader. Closing stale Socket[addr=/thishost,port=50010,localport=51345] java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162) at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:407) at org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:105) at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:937) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:455) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:645) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:689) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:454) at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1062) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:266) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.seekToDataBlock(HFileBlockIndex.java:209) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:519) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:534) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:178) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:111) at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:83) at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1779) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:2915) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1453) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1445) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1421) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2079) at sun.reflect.GeneratedMethodAccessor210.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345) 3) HDFS logs this at INFO: org.apache.hadoop.hdfs.server.datanode.DataNode Exception in receiveBlock for BP-1555962259-thisIP-1358284735183:blk_-5783891561030721738_125644 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:506) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219) at java.lang.Thread.run(Thread.java:662) 4) HDFS logs this as ERROR: org.apache.hadoop.hdfs.server.datanode.DataNode thishost:50010:DataXceiver error processing WRITE_BLOCK operation src: /thishost:50620 dest: /thishost:50010 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:506 |