Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> How MapReduce selects data blocks for processing user request

Mehal Patel 2013-02-09, 00:40
Copy link to this message
Re: How MapReduce selects data blocks for processing user request
Hi Mehal,

When Client makes a read request for a certain file say foo.txt, namenode
sends information of first block(BlockID) and the datanodes it resides on.

It's client which decides which datanode to pull information from. If first
request fails, it can make a retry to get another replica of block from
another datanode. This process repeats until all data is read.

Thanks and Regards,

Rishi Yadav

(o) 408.988.2000x113 ||  (f) 408.716.2726

InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)*

*INC 500 Fastest growing company in 2012 || 2011*

*Best Place to work in Bay Area 2012 - *SF Business Times and the Silicon
Valley / San Jose Business Journal

2041 Mission College Boulevard, #280 || Santa Clara, CA 95054
On Fri, Feb 8, 2013 at 4:40 PM, Mehal Patel <[EMAIL PROTECTED]> wrote:

> Hello All,
> I am confused over how MapReduce tasks select data blocks for processing
> user requests ?
> As data block replication replicates single data block over multiple
> datanodes, during job processing how uniquely
> data blocks are selected for processing user requests ? How does it
> guarantees that no same block gets chosen twice or thrice
> for different mapper task.
> Thank you
> -Mehal
Harsh J 2013-02-09, 05:12