Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Slow region server recoveries


Copy link to this message
-
Re: Slow region server recoveries
Hi Ted, Nicholas,

Thanks for the comments. We found some issues with lease recovery and I
patched HBASE 8354 to ensure we don't see data loss. Could you please look
at HDFS 4721 and HBASE 8389 ?

Thanks
Varun
On Sat, Apr 20, 2013 at 10:52 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> The important thing to note is the block for this rogue WAL is
> UNDER_RECOVERY state. I have repeatedly asked HDFS dev if the stale node
> thing kicks in correctly for UNDER_RECOVERY blocks but failed.
>
>
> On Sat, Apr 20, 2013 at 10:47 AM, Varun Sharma <[EMAIL PROTECTED]>wrote:
>
>> Hi Nicholas,
>>
>> Regarding the following, I think this is not a recovery - the file below
>> is an HFIle and is being accessed on a get request. On this cluster, I
>> don't have block locality. I see these exceptions for a while and then they
>> are gone, which means the stale node thing kicks in.
>>
>> 2013-04-19 00:27:28,432 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
>> connect to /10.156.194.94:50010 for file
>> /hbase/feeds/1479495ad2a02dceb41f093ebc29fe4f/home/
>> 02f639bb43944d4ba9abcf58287831c0
>> for block
>>
>> This is the real bummer. The stale datanode is 1st even 90 seconds
>> afterwards.
>>
>> *2013-04-19 00:28:35*,777 WARN
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of
>> hdfs://
>> ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141failed, returning error
>> java.io.IOException: Cannot obtain block length for
>> LocatedBlock{BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056;
>> getBlockSize()=0; corrupt=false; offset=0; locs=*[10.156.194.94:50010,
>> 10.156.192.106:50010, 10.156.195.38:50010]}*
>> >---at
>> org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:238)
>> >---at
>> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:182)
>> >---at
>> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124)
>> >---at
>> org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:117)
>> >---at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1080)
>> >---at
>> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:245)
>> >---at
>> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:78)
>> >---at
>> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1787)
>> >---at
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62)
>> >---at
>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1707)
>> >---at
>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728)
>> >---at
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55)
>> >---at
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
>> >---at
>> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:717)
>> >---at
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:821)
>> >---at
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:734)
>> >---at
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
>> >---at
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:348)
>> >---at
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:111)
>> >---at
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:264)
>> >---at
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:195)
>> >---at
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:163)
>> >---at java.lang.Thread.run(Thread.java:662)
>>
>>
>>
>> On Sat, Apr 20, 2013 at 1:16 AM, Nicolas Liochon <[EMAIL PROTECTED]>wrote: