Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Slow region server recoveries


Copy link to this message
-
Re: Slow region server recoveries
Hi Ted, Nicholas,

Thanks for the comments. We found some issues with lease recovery and I
patched HBASE 8354 to ensure we don't see data loss. Could you please look
at HDFS 4721 and HBASE 8389 ?

Thanks
Varun
On Sat, Apr 20, 2013 at 10:52 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> The important thing to note is the block for this rogue WAL is
> UNDER_RECOVERY state. I have repeatedly asked HDFS dev if the stale node
> thing kicks in correctly for UNDER_RECOVERY blocks but failed.
>
>
> On Sat, Apr 20, 2013 at 10:47 AM, Varun Sharma <[EMAIL PROTECTED]>wrote:
>
>> Hi Nicholas,
>>
>> Regarding the following, I think this is not a recovery - the file below
>> is an HFIle and is being accessed on a get request. On this cluster, I
>> don't have block locality. I see these exceptions for a while and then they
>> are gone, which means the stale node thing kicks in.
>>
>> 2013-04-19 00:27:28,432 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
>> connect to /10.156.194.94:50010 for file
>> /hbase/feeds/1479495ad2a02dceb41f093ebc29fe4f/home/
>> 02f639bb43944d4ba9abcf58287831c0
>> for block
>>
>> This is the real bummer. The stale datanode is 1st even 90 seconds
>> afterwards.
>>
>> *2013-04-19 00:28:35*,777 WARN
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of
>> hdfs://
>> ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141failed, returning error
>> java.io.IOException: Cannot obtain block length for
>> LocatedBlock{BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056;
>> getBlockSize()=0; corrupt=false; offset=0; locs=*[10.156.194.94:50010,
>> 10.156.192.106:50010, 10.156.195.38:50010]}*
>> >---at
>> org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:238)
>> >---at
>> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:182)
>> >---at
>> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124)
>> >---at
>> org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:117)
>> >---at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1080)
>> >---at
>> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:245)
>> >---at
>> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:78)
>> >---at
>> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1787)
>> >---at
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62)
>> >---at
>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1707)
>> >---at
>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728)
>> >---at
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55)
>> >---at
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
>> >---at
>> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:717)
>> >---at
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:821)
>> >---at
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:734)
>> >---at
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
>> >---at
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:348)
>> >---at
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:111)
>> >---at
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:264)
>> >---at
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:195)
>> >---at
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:163)
>> >---at java.lang.Thread.run(Thread.java:662)
>>
>>
>>
>> On Sat, Apr 20, 2013 at 1:16 AM, Nicolas Liochon <[EMAIL PROTECTED]>wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB