Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - What is the difference between Rack-local map tasks and Data-local map tasks?


+
centerqi hu 2012-10-07, 13:56
+
Michael Segel 2012-10-07, 14:45
+
centerqi hu 2012-10-07, 15:28
+
Bertrand Dechoux 2012-10-07, 19:31
+
paritosh ranjan 2012-10-07, 19:49
+
Harsh J 2012-10-07, 22:46
Copy link to this message
-
Re: What is the difference between Rack-local map tasks and Data-local map tasks?
Michael Segel 2012-10-08, 00:13
Ok,

So what would be the use case for this feature?

I mean when would locality take precedence over job time completion?

On Oct 7, 2012, at 5:46 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Bertrand,
>
> FairScheduler does support delay scheduling for locality via
> mapred.fairscheduler.locality.delay config prop. MR2's
> CapacityScheduler recently got similar support for better locality
> scheduling as well (see YARN-80). Is this not what you're talking of?
>
> On Mon, Oct 8, 2012 at 1:01 AM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote:
>> Basically, more replicas.
>>
>> The second solution would be to use a 'smarter' scheduler. In theory, the
>> jobtracker should be able to say "postpone this task until a data-local task
>> can be created". But I don't think any stable and public available scheduler
>> do that at the moment. This would allow you to have less traffic but the
>> whole job might be slower due to the wait. It might be a good trade if you
>> have multiple jobs running at the same time and if your hot data is
>> uniformly distributed. But in practice this is of course not always the case
>> and you also need to consider sla for the users so the whole is not trivial.
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Sun, Oct 7, 2012 at 5:28 PM, centerqi hu <[EMAIL PROTECTED]> wrote:
>>>
>>> Very good explanation,
>>> If there is a way to reduce Rack-local map tasks
>>> but can increase the Data-local map tasks ,
>>> Whether to increase performance?
>>>
>>> 2012/10/7 Michael Segel <[EMAIL PROTECTED]>
>>>>
>>>> Rack local means that while the data isn't local to the node running the
>>>> task, it is still on the same rack.
>>>> (Its meaningless unless you've set up rack awareness because all of the
>>>> machines are on the default rack. )
>>>>
>>>> Data local means that the task is running local to the machine that
>>>> contains the actual data.
>>>>
>>>> HTH
>>>>
>>>> -Mike
>>>>
>>>> On Oct 7, 2012, at 8:56 AM, centerqi hu <[EMAIL PROTECTED]> wrote:
>>>>
>>>>
>>>> hi all
>>>>
>>>> When I run "hadoop job -status xxx",Output the following some list.
>>>>
>>>> Rack-local map tasks=124
>>>> Data-local map tasks=6
>>>>
>>>> What is the difference between Rack-local map tasks and Data-local map
>>>> tasks?
>>>>
>>>> --
>>>> [EMAIL PROTECTED]|Sam
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> [EMAIL PROTECTED]|齐忠
>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>
>
>
> --
> Harsh J
>
+
Bertrand Dechoux 2012-10-08, 05:44
+
Bejoy KS 2012-10-07, 18:29
+
pengwenwu2008 2012-12-13, 06:22