Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Generation Stamp

Copy link to this message
RE: Generation Stamp

>From: Zhanwei Wang [[EMAIL PROTECTED]]
>Sent: Wednesday, November 30, 2011 4:34 PM
>Subject: Re: Generation Stamp

>Hi, everyone

>Following the discussing, I would like to know if the DataNode report a overage block to Namenode, according to >Uma, NameNode can reject it, what the DataNode will do then?
  NN will add then into invalidates list and inform to DN through heartbeats responses. Next action in Datanode will be to delete that block physically.
>Ask other datanode copy a new replica to it and delete the old one? Or NameNode will arrange the work if the >number of the replicas is below the specified value? Where can I find this code?
When NN replication moniter finds this block in neededReplications lists, it will choose one SRC node ( who has the good replica) and ask to replicate on other datanode to meet the replication.

Hope it helps you....

>Zhanwei Wang
发件人: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] 代表 kartheek muthyala
发送时间: 2011年11月30日 12:07
主题: Re: Generation Stamp

Thanks Uma..:)
On Tue, Nov 29, 2011 at 10:48 PM, Uma Maheswara Rao G <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Yes. :-)
From: kartheek muthyala [[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Tuesday, November 29, 2011 10:20 PM
Subject: Re: Generation Stamp
Uma, first of all thanks for the detailed exemplified explanation.

So to confirm, the primary use of having this generationTimeStamp is to ensure consistency of the block?. So, when the pipeline is failed at DN3, and the client invokes recovery, then the NN will chose DN1 to complete the pipeline. The DN1 first updates its metafile with the new time stamp, and then passes this information to the other replica at DN2. Further, in the future NN sees that this particular block is under replicated and it assigns some other DNa and asks either DN1/DN2 to replicate the same at DNa.

On Tue, Nov 29, 2011 at 8:10 PM, Uma Maheswara Rao G <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

Generationstamp is basically to keep track of the replica states.

 Consider one scenario where generation smap will be use:

  Create a file which has one block. client started writing that block to DN1, DN 2, DN3 ( pipeline )

After writing some data DN3 failed, then Client will get the exception about pipeline failuere. Then Client will handle that exception ( you can see it in processDataNodeError in DataStreamer thread) . It will remove DN3 and will call the recovery for that block with new generation time stamp, then NN will choose one primary DN and assign block synchronization work.Then primary DN will ensure that all the remainnng block lengths are same ( if require it will truncate to consistant length) and will invoke committblckSynchronization. Then remaing datatransfer will resume.

 now block will have new genartion timestamp. You can observe this in metadata file for that block in DN.

now the block will be like blk_12345634444<tel:12345634444>, blk_12345634444<tel:12345634444>_1234.meta

here 1234 is the generation timestamp.

Assume a case, after resuming the write again, DN2 fails, then again recovery will starts and will get new Generation time stamp again. now only DN1 in pipeline  and block is blk_12345634444<tel:12345634444>, blk_12345634444<tel:12345634444>_1235.meta. resume the the remaing data writes and complted the last packet. With the last packet blocks should be finalized. DN1 is finalized the block successfully and DN1 will send blocks received command and block info will be updated in blocks map . Assume if DN2 comes back and sending that old block in reports to NN. Here NN can find that generation timestamp of that block is lesser than DN1 reported blocks genstamp. So, it can take the decision now. it can reject the lesser generation time stamp block.

Yu can see this code in FSNameSystem#addStoredBlock.  ofcource there will be many conditions like length mismatch..etc

Hope it will help you....



From: kartheek muthyala [[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Tuesday, November 29, 2011 7:44 PM
To: hdfs-user
Subject: Generation Stamp
Hi all,
Why is there the concept of Generation Stamp that is getting tagged to the metadata of the block.? How is it useful? I have seen that in the hdfs current directory, the metafiles are tagged with this generation stamp. Does this keep track of the versioning?