Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - question about preserving data locality in MapReduce with Yarn


Copy link to this message
-
Re: question about preserving data locality in MapReduce with Yarn
Michael Segel 2013-10-29, 02:03
How do you know where the data exists when you begin?

Sent from a remote device. Please excuse any typos...

Mike Segel

> On Oct 28, 2013, at 8:57 PM, "ricky lee" <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I have a question about maintaining data locality in a MapReduce job launched through Yarn. Based on the Yarn tutorial, it seems like an application master can specify resource name, memory, and cpu when requesting containers. By carefully choosing resource names, I think the data locality can be achieved. I am curious how the current MapReduce application master is doing this. Does it check all needed blocks for a job and choose subset of nodes with the most needed blocks? If someone can point me source code snippets that make this decision, it would be very much appreciated. thx.
>
> -r