-Re: question about preserving data locality in MapReduce with Yarn
Arun C Murthy 2013-11-01, 02:14
The code is slightly hard to follow since it's split between the client and the ApplicationMaster.
The client invokes InputFormat.getSplits to compute locations and writes it to a file in HDFS.
The ApplicationMaster then reads the file and creates resource-requests based on the locations for each input file (3-replicas). See TaskAttemptImpl.dataLocalHosts and TaskAttemptImpl.dataLocalRacks - follow those variables around in the code-base.
On Oct 28, 2013, at 11:10 PM, ricky l <[EMAIL PROTECTED]> wrote:
> Hi Sandy, thank you very much for the information. It is good to know that MapReduce AM considers the block location information. BTW, I am not very familiar with the concept of splits. Is it specific to MR jobs? If possible, code location would be very helpful for reference as I am trying to implement an application master that needs to consider HDFS data-locality. thx.
> On Mon, Oct 28, 2013 at 10:21 PM, Sandy Ryza <[EMAIL PROTECTED]> wrote:
> Hi Ricky,
> The input splits contain the locations of the blocks they cover. The AM gets the information from the input splits and submits requests for those location. Each container request spans all the replicas that the block is located on. Are you interested in something more specific?
> On Mon, Oct 28, 2013 at 7:09 PM, ricky lee <[EMAIL PROTECTED]> wrote:
> Well, I thought an application master can somewhat ask where the data exist to a namenode.... isn't it true? If it does not know where the data reside, does a MapReduce application master specify the resource name as "*" which means data locality might not be preserved at all? thx,
Arun C. Murthy
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.