When Client makes a read request for a certain file say foo.txt, namenode
sends information of first block(BlockID) and the datanodes it resides on.
It's client which decides which datanode to pull information from. If first
request fails, it can make a retry to get another replica of block from
another datanode. This process repeats until all data is read.
Thanks and Regards,
(o) 408.988.2000x113 || (f) 408.716.2726
InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)*
*INC 500 Fastest growing company in 2012 || 2011*
*Best Place to work in Bay Area 2012 - *SF Business Times and the Silicon
Valley / San Jose Business Journal
2041 Mission College Boulevard, #280 || Santa Clara, CA 95054
On Fri, Feb 8, 2013 at 4:40 PM, Mehal Patel <[EMAIL PROTECTED]> wrote:
> Hello All,
> I am confused over how MapReduce tasks select data blocks for processing
> user requests ?
> As data block replication replicates single data block over multiple
> datanodes, during job processing how uniquely
> data blocks are selected for processing user requests ? How does it
> guarantees that no same block gets chosen twice or thrice
> for different mapper task.
> Thank you