Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - RegionServer stuck in internalObtainRowLock forever - HBase 0.94.7


Copy link to this message
-
Re: RegionServer stuck in internalObtainRowLock forever - HBase 0.94.7
Asaf Mesika 2014-02-17, 10:00
Hi,

Apparently this just happened on Staging machine as well. The common ground
between is a failed disk (1 out of 8).

It seems like a bug if HBase can't recover from a failed disk. Could it be
that short circuit is causing it?

Couple of interesting exceptions:

1. The following repeated several time

2014-02-16 21:20:02,850 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception  for block
BP-1188452996-10.223.226.16-1391605188016:blk_-2678974508557130432_1358017
java.io.IOException: Bad response ERROR for block
BP-1188452996-10.223.226.16-1391605188016:blk_-2678974508557130432_1358017
from datanode 10.223.226.66:50010
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:705)
2014-02-16 21:21:09,640 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream
java.net.SocketTimeoutException: 75000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.223.226.91:39217remote=/
10.223.226.91:50010]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:156)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:117)
at java.io.FilterInputStream.read(FilterInputStream.java:66)
at java.io.FilterInputStream.read(FilterInputStream.java:66)
at
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:169)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1105)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1039)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:487)
2014-02-16 21:21:09,641 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
BP-1188452996-10.223.226.16-1391605188016:blk_-4136849541182435905_1359225
2014-02-16 21:21:09,658 INFO org.apache.hadoop.hdfs.DFSClient: Excluding
datanode 10.223.226.91:50010

2. Then we started gettings tons of the following exception interwined with
exception listed above in 1

2014-02-16 21:23:45,789 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow):
{"processingtimems":75164,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2588f0a),
rpc version=1, client version=29, methodsFingerPrint=-1368823753","client":"
10.223.226.165:30412
","starttimems":1392585750594,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}

3. Then we got this:

2014-02-16 22:03:47,626 WARN org.apache.hadoop.hdfs.DFSClient: failed to
connect to DomainSocket(fd=1463,path=/var/run/hdfs-sockets/dn)
java.net.SocketTimeoutException: read(2) error: Resource temporarily
unavailable
at org.apache.hadoop.net.unix.DomainSocket.readArray0(Native Method)
at org.apache.hadoop.net.unix.DomainSocket.access$200(DomainSocket.java:47)
at
org.apache.hadoop.net.unix.DomainSocket$DomainInputStream.read(DomainSocket.java:530)
at java.io.FilterInputStream.read(FilterInputStream.java:66)
at
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:169)
at
org.apache.hadoop.hdfs.BlockReaderFactory.newShortCircuitBlockReader(BlockReaderFactory.java:187)
at
org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:104)
at
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1060)
at
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:538)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794)
at java.io.DataInputStream.read(DataInputStream.java:132)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:584)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1414)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1839)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1703)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:338)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:593)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:691)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:130)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:95)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:383)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
at
org.apache.hadoop.hbase.regionserver.Compactor.compact(Compactor.java:184)
at org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:1081)
at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1335)
at
org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest.run(CompactionRequest.java:303)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

4. Then tons of this (region key and table changed for privacy reasons)

2014-02-16 22:04:32,681 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Blocking updates for 'IPC Server handler 942 on 60020' on region
ABC,\x0A\A,1392584906041.3ee371c1d20deaa9fc945b77d8c75f9a.: memstore size
512.0 M is >= than blocking 512 M size

5. Couple of those:

2014-02-16 22:04:47,687 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
connect to /10.223.226.91:50010 for block, add to deadNodes and continue.
java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.223.226.91:41109remot