-Re: question about preserving data locality in MapReduce with Yarn
Sandy Ryza 2013-10-31, 22:59
Splits are a MapReduce concept . Check out FileInputFormat for how an
example of how to get block locations. You can then pass these locations
into an AMRMClient.ContainerRequest.
On Mon, Oct 28, 2013 at 8:10 PM, ricky l <[EMAIL PROTECTED]> wrote:
> Hi Sandy, thank you very much for the information. It is good to know that
> MapReduce AM considers the block location information. BTW, I am not very
> familiar with the concept of splits. Is it specific to MR jobs? If
> possible, code location would be very helpful for reference as I am trying
> to implement an application master that needs to consider HDFS
> data-locality. thx.
> On Mon, Oct 28, 2013 at 10:21 PM, Sandy Ryza <[EMAIL PROTECTED]>wrote:
>> Hi Ricky,
>> The input splits contain the locations of the blocks they cover. The AM
>> gets the information from the input splits and submits requests for those
>> location. Each container request spans all the replicas that the block is
>> located on. Are you interested in something more specific?
>> On Mon, Oct 28, 2013 at 7:09 PM, ricky lee <[EMAIL PROTECTED]>wrote:
>>> Well, I thought an application master can somewhat ask where the data
>>> exist to a namenode.... isn't it true? If it does not know where the data
>>> reside, does a MapReduce application master specify the resource name as
>>> "*" which means data locality might not be preserved at all? thx,