Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RegionServer stuck in internalObtainRowLock forever - HBase 0.94.7


Copy link to this message
-
Re: RegionServer stuck in internalObtainRowLock forever - HBase 0.94.7
Hi,

Apparently this just happened on Staging machine as well. The common ground
between is a failed disk (1 out of 8).

It seems like a bug if HBase can't recover from a failed disk. Could it be
that short circuit is causing it?

Couple of interesting exceptions:

1. The following repeated several time

2014-02-16 21:20:02,850 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception  for block
BP-1188452996-10.223.226.16-1391605188016:blk_-2678974508557130432_1358017
java.io.IOException: Bad response ERROR for block
BP-1188452996-10.223.226.16-1391605188016:blk_-2678974508557130432_1358017
from datanode 10.223.226.66:50010
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:705)
2014-02-16 21:21:09,640 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream
java.net.SocketTimeoutException: 75000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.223.226.91:39217remote=/
10.223.226.91:50010]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:156)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:117)
at java.io.FilterInputStream.read(FilterInputStream.java:66)
at java.io.FilterInputStream.read(FilterInputStream.java:66)
at
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:169)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1105)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1039)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:487)
2014-02-16 21:21:09,641 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
BP-1188452996-10.223.226.16-1391605188016:blk_-4136849541182435905_1359225
2014-02-16 21:21:09,658 INFO org.apache.hadoop.hdfs.DFSClient: Excluding
datanode 10.223.226.91:50010

2. Then we started gettings tons of the following exception interwined with
exception listed above in 1

2014-02-16 21:23:45,789 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow):
{"processingtimems":75164,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2588f0a),
rpc version=1, client version=29, methodsFingerPrint=-1368823753","client":"
10.223.226.165:30412
","starttimems":1392585750594,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}

3. Then we got this:

2014-02-16 22:03:47,626 WARN org.apache.hadoop.hdfs.DFSClient: failed to
connect to DomainSocket(fd=1463,path=/var/run/hdfs-sockets/dn)
java.net.SocketTimeoutException: read(2) error: Resource temporarily
unavailable
at org.apache.hadoop.net.unix.DomainSocket.readArray0(Native Method)
at org.apache.hadoop.net.unix.DomainSocket.access$200(DomainSocket.java:47)
at
org.apache.hadoop.net.unix.DomainSocket$DomainInputStream.read(DomainSocket.java:530)
at java.io.FilterInputStream.read(FilterInputStream.java:66)
at
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:169)
at
org.apache.hadoop.hdfs.BlockReaderFactory.newShortCircuitBlockReader(BlockReaderFactory.java:187)
at
org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:104)
at
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1060)
at
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:538)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794)
at java.io.DataInputStream.read(DataInputStream.java:132)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:584)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1414)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1839)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1703)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:338)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:593)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:691)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:130)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:95)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:383)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
at
org.apache.hadoop.hbase.regionserver.Compactor.compact(Compactor.java:184)
at org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:1081)
at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1335)
at
org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest.run(CompactionRequest.java:303)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

4. Then tons of this (region key and table changed for privacy reasons)

2014-02-16 22:04:32,681 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Blocking updates for 'IPC Server handler 942 on 60020' on region
ABC,\x0A\A,1392584906041.3ee371c1d20deaa9fc945b77d8c75f9a.: memstore size
512.0 M is >= than blocking 512 M size

5. Couple of those:

2014-02-16 22:04:47,687 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
connect to /10.223.226.91:50010 for block, add to deadNodes and continue.
java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.223.226.91:41109remot
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB