Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Slow region server recoveries


Copy link to this message
-
Re: Slow region server recoveries
Hi Nicholas,

Here is the failure scenario, I have dug up the logs.

A machine fails and stops accepting/transmitting traffic. The HMaster
starts the distributed split for 13 tasks. There are 12 region servers. 12
tasks succeed but the 13th one takes a looong time.

Zookeeper timeout is set to 30 seconds. Stale node timeout is 20 seconds.
Both patches are there.

a) Machine fails around 27:30
b) Master starts the split around 27:40 and submits the tasks. The one task
which fails seems to be the one which contains the WAL being currently
written to:

2013-04-19 00:27:44,325 INFO
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog:
hdfs://
ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141,
length=0

Basically this region server picks up the task but finds the length of this
file to be 0 and drops. This happens again

c) Finally another region server picks up the task but it ends up going to
the bad datanode which should not happen because of the stale node timeout)
Unfortunately it hits the 45 retries and a connect timeout of 20 seconds
every time. This delays recovery significantly. Now I guess reducing # of
retries to 1 is one possibility.
But then the namenode logs are very interesting.

d) Namenode seems to be in cyclic lease recovery loop until the node is
marked dead. There is this one last block which exhibits this.

2013-04-19 00:28:09,744 INFO BlockStateChange: BLOCK* blk_-*
5723958680970112840_174056*{blockUCState=UNDER_RECOVERY,
primaryNodeIndex=1,
replicas=[ReplicaUnderConstruction[10.156.194.94:50010|RBW],
ReplicaUnderConstruction[10.156.192.106:50010|RBW],
ReplicaUnderConstruction[10.156.195.38:50010|RBW]]} recovery started,
primary=10.156.192.106:50010
2013-04-19 00:28:09,744 WARN org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.internalReleaseLease: File
/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
has not been closed. Lease recovery is in progress. RecoveryId = 174413 for
block blk_-5723958680970112840_174056{blockUCState=UNDER_RECOVERY,
primaryNodeIndex=1,
replicas=[ReplicaUnderConstruction[10.156.194.94:50010|RBW],
ReplicaUnderConstruction[10.156.192.106:50010|RBW],
ReplicaUnderConstruction[10.156.195.38:50010|RBW]]}

I see this over and over again in the logs until the datanode is marked
dead. It seems to be cycling through the replicas for this WAL block and
trying to add it to the recovery list. I looked at the code and it says:

      // Cannot close file right now, since the last block requires
recovery.
      // This may potentially cause infinite loop in lease recovery
      // if there are no valid replicas on data-nodes.
      NameNode.stateChangeLog.warn(
                "DIR* NameSystem.internalReleaseLease: " +
                "File " + src + " has not been closed." +
               " Lease recovery is in progress. " +
                "RecoveryId = " + blockRecoveryId + " for block " +
lastBlock);
      break;

Eventually for this block, we get

2013-04-19 00:41:20,736 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=BP-696828882-10.168.7.226-1364886167971:blk_-
*5723958680970112840_174056*, newgenerationstamp=174413,
newlength=119148648, newtargets=[10.156.192.106:50010, 10.156.195.38:50010],
closeFile=true, deleteBlock=false)
2013-04-19 00:41:20,736 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hdfs (auth:SIMPLE) cause:java.io.IOException: Block
(=BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056)
not found
2013-04-19 00:41:20,736 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 35 on 8020, call
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.commitBlockSynchronization
from 10.156.192.106:53271: error: java.io.IOException: Block
(=BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056)
not found

On the datanode side, i see a call for recover blocks - I see that a write
pipeline is there, which gets terminated with some socket timeouts...

00:28:11,471 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: NameNode
at ec2-107-20-237-30.compute-1.amazonaws.com/10.168.7.226:8020 calls
recoverBlock(BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056,
targets=[10.156.194.94:50010, 10.156.192.106:50010, 10.156.195.38:50010],
newGenerationStamp=174413)

Not sure but this looks like a case where data could be lost   ?

Varun
On Fri, Apr 19, 2013 at 12:38 AM, Nicolas Liochon <[EMAIL PROTECTED]> wrote:

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB