Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: question about preserving data locality in MapReduce with Yarn


Copy link to this message
-
Re: question about preserving data locality in MapReduce with Yarn
Splits are a MapReduce concept . Check out FileInputFormat for how an
example of how to get block locations.  You can then pass these locations
into an AMRMClient.ContainerRequest.

-Sandy
On Mon, Oct 28, 2013 at 8:10 PM, ricky l <[EMAIL PROTECTED]> wrote:

> Hi Sandy, thank you very much for the information. It is good to know that
> MapReduce AM considers the block location information. BTW, I am not very
> familiar with the concept of splits. Is it specific to MR jobs? If
> possible, code location would be very helpful for reference as I am trying
> to implement an application master that needs to consider HDFS
> data-locality. thx.
>
> r.
>
>
> On Mon, Oct 28, 2013 at 10:21 PM, Sandy Ryza <[EMAIL PROTECTED]>wrote:
>
>> Hi Ricky,
>>
>> The input splits contain the locations of the blocks they cover.  The AM
>> gets the information from the input splits and submits requests for those
>> location.  Each container request spans all the replicas that the block is
>> located on.  Are you interested in something more specific?
>>
>> -Sandy
>>
>>
>> On Mon, Oct 28, 2013 at 7:09 PM, ricky lee <[EMAIL PROTECTED]>wrote:
>>
>>> Well, I thought an application master can somewhat ask where the data
>>> exist to a namenode.... isn't it true? If it does not know where the data
>>> reside, does a MapReduce application master specify the resource name as
>>> "*" which means data locality might not be preserved at all? thx,
>>>
>>> r
>>>
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB