Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Slow region server recoveries


+
Varun Sharma 2013-04-19, 01:01
+
Varun Sharma 2013-04-19, 01:37
+
Ted Yu 2013-04-19, 04:37
+
Nicolas Liochon 2013-04-19, 07:38
+
Varun Sharma 2013-04-19, 10:46
+
Nicolas Liochon 2013-04-19, 11:00
+
Varun Sharma 2013-04-19, 17:28
+
Ted Yu 2013-04-19, 17:40
+
Varun Sharma 2013-04-19, 17:53
+
Varun Sharma 2013-04-19, 20:09
+
Varun Sharma 2013-04-19, 20:10
+
Nicolas Liochon 2013-04-20, 08:16
+
Varun Sharma 2013-04-20, 17:47
+
Varun Sharma 2013-04-20, 17:52
+
Varun Sharma 2013-04-21, 17:38
+
Nicolas Liochon 2013-04-22, 07:51
Copy link to this message
-
Re: Slow region server recoveries
Varun:
Thanks for trying out HBASE-8354 .

Can you move the text in Environment section of HBASE-8389 to Description ?

If you have a patch for HBASE-8389, can you upload it ?

Cheers

On Sun, Apr 21, 2013 at 10:38 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> Hi Ted, Nicholas,
>
> Thanks for the comments. We found some issues with lease recovery and I
> patched HBASE 8354 to ensure we don't see data loss. Could you please look
> at HDFS 4721 and HBASE 8389 ?
>
> Thanks
> Varun
>
>
> On Sat, Apr 20, 2013 at 10:52 AM, Varun Sharma <[EMAIL PROTECTED]>
> wrote:
>
> > The important thing to note is the block for this rogue WAL is
> > UNDER_RECOVERY state. I have repeatedly asked HDFS dev if the stale node
> > thing kicks in correctly for UNDER_RECOVERY blocks but failed.
> >
> >
> > On Sat, Apr 20, 2013 at 10:47 AM, Varun Sharma <[EMAIL PROTECTED]
> >wrote:
> >
> >> Hi Nicholas,
> >>
> >> Regarding the following, I think this is not a recovery - the file below
> >> is an HFIle and is being accessed on a get request. On this cluster, I
> >> don't have block locality. I see these exceptions for a while and then
> they
> >> are gone, which means the stale node thing kicks in.
> >>
> >> 2013-04-19 00:27:28,432 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
> >> connect to /10.156.194.94:50010 for file
> >> /hbase/feeds/1479495ad2a02dceb41f093ebc29fe4f/home/
> >> 02f639bb43944d4ba9abcf58287831c0
> >> for block
> >>
> >> This is the real bummer. The stale datanode is 1st even 90 seconds
> >> afterwards.
> >>
> >> *2013-04-19 00:28:35*,777 WARN
> >> org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of
> >> hdfs://
> >>
> ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141failed,
> returning error
> >> java.io.IOException: Cannot obtain block length for
> >>
> LocatedBlock{BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056;
> >> getBlockSize()=0; corrupt=false; offset=0; locs=*[10.156.194.94:50010,
> >> 10.156.192.106:50010, 10.156.195.38:50010]}*
> >> >---at
> >>
> org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:238)
> >> >---at
> >>
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:182)
> >> >---at
> >> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124)
> >> >---at
> >> org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:117)
> >> >---at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1080)
> >> >---at
> >>
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:245)
> >> >---at
> >>
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:78)
> >> >---at
> >>
> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1787)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62)
> >> >---at
> >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1707)
> >> >---at
> >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
> >> >---at
> >> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:717)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:821)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:734)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:348)
> >> >---at
> >
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB