Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> What is the difference between Rack-local map tasks and Data-local map tasks?


Copy link to this message
-
Re: What is the difference between Rack-local map tasks and Data-local map tasks?
Bertrand,

FairScheduler does support delay scheduling for locality via
mapred.fairscheduler.locality.delay config prop. MR2's
CapacityScheduler recently got similar support for better locality
scheduling as well (see YARN-80). Is this not what you're talking of?

On Mon, Oct 8, 2012 at 1:01 AM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote:
> Basically, more replicas.
>
> The second solution would be to use a 'smarter' scheduler. In theory, the
> jobtracker should be able to say "postpone this task until a data-local task
> can be created". But I don't think any stable and public available scheduler
> do that at the moment. This would allow you to have less traffic but the
> whole job might be slower due to the wait. It might be a good trade if you
> have multiple jobs running at the same time and if your hot data is
> uniformly distributed. But in practice this is of course not always the case
> and you also need to consider sla for the users so the whole is not trivial.
>
> Regards
>
> Bertrand
>
>
> On Sun, Oct 7, 2012 at 5:28 PM, centerqi hu <[EMAIL PROTECTED]> wrote:
>>
>> Very good explanation,
>> If there is a way to reduce Rack-local map tasks
>> but can increase the Data-local map tasks ,
>> Whether to increase performance?
>>
>> 2012/10/7 Michael Segel <[EMAIL PROTECTED]>
>>>
>>> Rack local means that while the data isn't local to the node running the
>>> task, it is still on the same rack.
>>> (Its meaningless unless you've set up rack awareness because all of the
>>> machines are on the default rack. )
>>>
>>> Data local means that the task is running local to the machine that
>>> contains the actual data.
>>>
>>> HTH
>>>
>>> -Mike
>>>
>>> On Oct 7, 2012, at 8:56 AM, centerqi hu <[EMAIL PROTECTED]> wrote:
>>>
>>>
>>> hi all
>>>
>>> When I run "hadoop job -status xxx",Output the following some list.
>>>
>>> Rack-local map tasks=124
>>> Data-local map tasks=6
>>>
>>> What is the difference between Rack-local map tasks and Data-local map
>>> tasks?
>>>
>>> --
>>> [EMAIL PROTECTED]|Sam
>>>
>>>
>>
>>
>>
>> --
>> [EMAIL PROTECTED]|齐忠
>
>
>
>
> --
> Bertrand Dechoux

--
Harsh J