Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> What is the difference between Rack-local map tasks and Data-local map tasks?


+
centerqi hu 2012-10-07, 13:56
+
Michael Segel 2012-10-07, 14:45
+
centerqi hu 2012-10-07, 15:28
+
Bertrand Dechoux 2012-10-07, 19:31
+
paritosh ranjan 2012-10-07, 19:49
Copy link to this message
-
Re: What is the difference between Rack-local map tasks and Data-local map tasks?
Bertrand,

FairScheduler does support delay scheduling for locality via
mapred.fairscheduler.locality.delay config prop. MR2's
CapacityScheduler recently got similar support for better locality
scheduling as well (see YARN-80). Is this not what you're talking of?

On Mon, Oct 8, 2012 at 1:01 AM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote:
> Basically, more replicas.
>
> The second solution would be to use a 'smarter' scheduler. In theory, the
> jobtracker should be able to say "postpone this task until a data-local task
> can be created". But I don't think any stable and public available scheduler
> do that at the moment. This would allow you to have less traffic but the
> whole job might be slower due to the wait. It might be a good trade if you
> have multiple jobs running at the same time and if your hot data is
> uniformly distributed. But in practice this is of course not always the case
> and you also need to consider sla for the users so the whole is not trivial.
>
> Regards
>
> Bertrand
>
>
> On Sun, Oct 7, 2012 at 5:28 PM, centerqi hu <[EMAIL PROTECTED]> wrote:
>>
>> Very good explanation,
>> If there is a way to reduce Rack-local map tasks
>> but can increase the Data-local map tasks ,
>> Whether to increase performance?
>>
>> 2012/10/7 Michael Segel <[EMAIL PROTECTED]>
>>>
>>> Rack local means that while the data isn't local to the node running the
>>> task, it is still on the same rack.
>>> (Its meaningless unless you've set up rack awareness because all of the
>>> machines are on the default rack. )
>>>
>>> Data local means that the task is running local to the machine that
>>> contains the actual data.
>>>
>>> HTH
>>>
>>> -Mike
>>>
>>> On Oct 7, 2012, at 8:56 AM, centerqi hu <[EMAIL PROTECTED]> wrote:
>>>
>>>
>>> hi all
>>>
>>> When I run "hadoop job -status xxx",Output the following some list.
>>>
>>> Rack-local map tasks=124
>>> Data-local map tasks=6
>>>
>>> What is the difference between Rack-local map tasks and Data-local map
>>> tasks?
>>>
>>> --
>>> [EMAIL PROTECTED]|Sam
>>>
>>>
>>
>>
>>
>> --
>> [EMAIL PROTECTED]|齐忠
>
>
>
>
> --
> Bertrand Dechoux

--
Harsh J
+
Michael Segel 2012-10-08, 00:13
+
Bertrand Dechoux 2012-10-08, 05:44
+
Bejoy KS 2012-10-07, 18:29
+
pengwenwu2008 2012-12-13, 06:22
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB