Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Slow region server recoveries


+
Varun Sharma 2013-04-19, 01:01
+
Varun Sharma 2013-04-19, 01:37
+
Ted Yu 2013-04-19, 04:37
+
Nicolas Liochon 2013-04-19, 07:38
+
Varun Sharma 2013-04-19, 10:46
+
Nicolas Liochon 2013-04-19, 11:00
+
Varun Sharma 2013-04-19, 17:28
+
Ted Yu 2013-04-19, 17:40
+
Varun Sharma 2013-04-19, 17:53
+
Varun Sharma 2013-04-19, 20:09
+
Varun Sharma 2013-04-19, 20:10
+
Nicolas Liochon 2013-04-20, 08:16
+
Varun Sharma 2013-04-20, 17:47
+
Varun Sharma 2013-04-20, 17:52
+
Varun Sharma 2013-04-21, 17:38
+
Nicolas Liochon 2013-04-22, 07:51
Copy link to this message
-
Re: Slow region server recoveries
Ted Yu 2013-04-21, 17:57
Varun:
Thanks for trying out HBASE-8354 .

Can you move the text in Environment section of HBASE-8389 to Description ?

If you have a patch for HBASE-8389, can you upload it ?

Cheers

On Sun, Apr 21, 2013 at 10:38 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> Hi Ted, Nicholas,
>
> Thanks for the comments. We found some issues with lease recovery and I
> patched HBASE 8354 to ensure we don't see data loss. Could you please look
> at HDFS 4721 and HBASE 8389 ?
>
> Thanks
> Varun
>
>
> On Sat, Apr 20, 2013 at 10:52 AM, Varun Sharma <[EMAIL PROTECTED]>
> wrote:
>
> > The important thing to note is the block for this rogue WAL is
> > UNDER_RECOVERY state. I have repeatedly asked HDFS dev if the stale node
> > thing kicks in correctly for UNDER_RECOVERY blocks but failed.
> >
> >
> > On Sat, Apr 20, 2013 at 10:47 AM, Varun Sharma <[EMAIL PROTECTED]
> >wrote:
> >
> >> Hi Nicholas,
> >>
> >> Regarding the following, I think this is not a recovery - the file below
> >> is an HFIle and is being accessed on a get request. On this cluster, I
> >> don't have block locality. I see these exceptions for a while and then
> they
> >> are gone, which means the stale node thing kicks in.
> >>
> >> 2013-04-19 00:27:28,432 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
> >> connect to /10.156.194.94:50010 for file
> >> /hbase/feeds/1479495ad2a02dceb41f093ebc29fe4f/home/
> >> 02f639bb43944d4ba9abcf58287831c0
> >> for block
> >>
> >> This is the real bummer. The stale datanode is 1st even 90 seconds
> >> afterwards.
> >>
> >> *2013-04-19 00:28:35*,777 WARN
> >> org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of
> >> hdfs://
> >>
> ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141failed,
> returning error
> >> java.io.IOException: Cannot obtain block length for
> >>
> LocatedBlock{BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056;
> >> getBlockSize()=0; corrupt=false; offset=0; locs=*[10.156.194.94:50010,
> >> 10.156.192.106:50010, 10.156.195.38:50010]}*
> >> >---at
> >>
> org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:238)
> >> >---at
> >>
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:182)
> >> >---at
> >> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124)
> >> >---at
> >> org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:117)
> >> >---at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1080)
> >> >---at
> >>
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:245)
> >> >---at
> >>
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:78)
> >> >---at
> >>
> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1787)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62)
> >> >---at
> >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1707)
> >> >---at
> >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
> >> >---at
> >> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:717)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:821)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:734)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:348)
> >> >---at
> >