Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> What is the difference between Rack-local map tasks and Data-local map tasks?


+
centerqi hu 2012-10-07, 13:56
+
Michael Segel 2012-10-07, 14:45
+
centerqi hu 2012-10-07, 15:28
+
Bertrand Dechoux 2012-10-07, 19:31
+
paritosh ranjan 2012-10-07, 19:49
+
Harsh J 2012-10-07, 22:46
+
Michael Segel 2012-10-08, 00:13
Copy link to this message
-
Re: What is the difference between Rack-local map tasks and Data-local map tasks?
@Harsh : I didn't know. That's good to hear. I will check out
the mapred.fairscheduler.locality.delay in FairScheduler.
And I will also look at YARN-80 for my personal information.

Thanks!

Bertrand

On Mon, Oct 8, 2012 at 2:13 AM, Michael Segel <[EMAIL PROTECTED]>wrote:

> Ok,
>
> So what would be the use case for this feature?
>
> I mean when would locality take precedence over job time completion?
>
> On Oct 7, 2012, at 5:46 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>
> > Bertrand,
> >
> > FairScheduler does support delay scheduling for locality via
> > mapred.fairscheduler.locality.delay config prop. MR2's
> > CapacityScheduler recently got similar support for better locality
> > scheduling as well (see YARN-80). Is this not what you're talking of?
> >
> > On Mon, Oct 8, 2012 at 1:01 AM, Bertrand Dechoux <[EMAIL PROTECTED]>
> wrote:
> >> Basically, more replicas.
> >>
> >> The second solution would be to use a 'smarter' scheduler. In theory,
> the
> >> jobtracker should be able to say "postpone this task until a data-local
> task
> >> can be created". But I don't think any stable and public available
> scheduler
> >> do that at the moment. This would allow you to have less traffic but the
> >> whole job might be slower due to the wait. It might be a good trade if
> you
> >> have multiple jobs running at the same time and if your hot data is
> >> uniformly distributed. But in practice this is of course not always the
> case
> >> and you also need to consider sla for the users so the whole is not
> trivial.
> >>
> >> Regards
> >>
> >> Bertrand
> >>
> >>
> >> On Sun, Oct 7, 2012 at 5:28 PM, centerqi hu <[EMAIL PROTECTED]> wrote:
> >>>
> >>> Very good explanation,
> >>> If there is a way to reduce Rack-local map tasks
> >>> but can increase the Data-local map tasks ,
> >>> Whether to increase performance?
> >>>
> >>> 2012/10/7 Michael Segel <[EMAIL PROTECTED]>
> >>>>
> >>>> Rack local means that while the data isn't local to the node running
> the
> >>>> task, it is still on the same rack.
> >>>> (Its meaningless unless you've set up rack awareness because all of
> the
> >>>> machines are on the default rack. )
> >>>>
> >>>> Data local means that the task is running local to the machine that
> >>>> contains the actual data.
> >>>>
> >>>> HTH
> >>>>
> >>>> -Mike
> >>>>
> >>>> On Oct 7, 2012, at 8:56 AM, centerqi hu <[EMAIL PROTECTED]> wrote:
> >>>>
> >>>>
> >>>> hi all
> >>>>
> >>>> When I run "hadoop job -status xxx",Output the following some list.
> >>>>
> >>>> Rack-local map tasks=124
> >>>> Data-local map tasks=6
> >>>>
> >>>> What is the difference between Rack-local map tasks and Data-local map
> >>>> tasks?
> >>>>
> >>>> --
> >>>> [EMAIL PROTECTED]|Sam
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> [EMAIL PROTECTED]|齐忠
> >>
> >>
> >>
> >>
> >> --
> >> Bertrand Dechoux
> >
> >
> >
> > --
> > Harsh J
> >
>
>
--
Bertrand Dechoux
+
Bejoy KS 2012-10-07, 18:29
+
pengwenwu2008 2012-12-13, 06:22