Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - question about preserving data locality in MapReduce with Yarn

Copy link to this message
Re: question about preserving data locality in MapReduce with Yarn
ricky l 2013-10-29, 03:10
Hi Sandy, thank you very much for the information. It is good to know that
MapReduce AM considers the block location information. BTW, I am not very
familiar with the concept of splits. Is it specific to MR jobs? If
possible, code location would be very helpful for reference as I am trying
to implement an application master that needs to consider HDFS
data-locality. thx.

On Mon, Oct 28, 2013 at 10:21 PM, Sandy Ryza <[EMAIL PROTECTED]>wrote:

> Hi Ricky,
> The input splits contain the locations of the blocks they cover.  The AM
> gets the information from the input splits and submits requests for those
> location.  Each container request spans all the replicas that the block is
> located on.  Are you interested in something more specific?
> -Sandy
> On Mon, Oct 28, 2013 at 7:09 PM, ricky lee <[EMAIL PROTECTED]> wrote:
>> Well, I thought an application master can somewhat ask where the data
>> exist to a namenode.... isn't it true? If it does not know where the data
>> reside, does a MapReduce application master specify the resource name as
>> "*" which means data locality might not be preserved at all? thx,
>> r