Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Re: Control over max map/reduce tasks per job


+
Chris K Wensel 2009-02-03, 19:33
Copy link to this message
-
RE: Control over max map/reduce tasks per job
Jonathan Gray 2009-02-03, 19:44
Chris,

For my specific use cases, it would be best to be able to set N
mappers/reducers per job per node (so I can explicitly say, run at most 2 at
a time of this CPU bound task on any given node).  However, the other way
would work as well (on 10 node system, would set job to max 20 tasks at a
time globally), but opens up the possibility that a node could be assigned
more than 2 of that task.

I would work with whatever is easiest to implement as either would be a vast
improvement for me (can run high numbers of network latency bound tasks
without fear of cpu bound tasks killing the cluster).

JG

> -----Original Message-----
> From: Chris K Wensel [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, February 03, 2009 11:34 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Control over max map/reduce tasks per job
>
> Hey Jonathan
>
> Are you looking to limit the total number of concurrent mapper/
> reducers a single job can consume cluster wide, or limit the number
> per node?
>
> That is, you have X mappers/reducers, but only can allow N mappers/
> reducers to run at a time globally, for a given job.
>
> Or, you are cool with all X running concurrently globally, but want to
> guarantee that no node can run more than N tasks from that job?
>
> Or both?
>
> just reconciling the conversation we had last week with this thread.
>
> ckw
>
> On Feb 3, 2009, at 11:16 AM, Jonathan Gray wrote:
>
> > All,
> >
> >
> >
> > I have a few relatively small clusters (5-20 nodes) and am having
> > trouble
> > keeping them loaded with my MR jobs.
> >
> >
> >
> > The primary issue is that I have different jobs that have drastically
> > different patterns.  I have jobs that read/write to/from HBase or
> > Hadoop
> > with minimal logic (network throughput bound or io bound), others
> that
> > perform crawling (network latency bound), and one huge parsing
> > streaming job
> > (very CPU bound, each task eats a core).
> >
> >
> >
> > I'd like to launch very large numbers of tasks for network latency
> > bound
> > jobs, however the large CPU bound job means I have to keep the max
> > maps
> > allowed per node low enough as to not starve the Datanode and
> > Regionserver.
> >
> >
> >
> > I'm an HBase dev but not familiar enough with Hadoop MR code to even
> > know
> > what would be involved with implementing this.  However, in talking
> > with
> > other users, it seems like this would be a well-received option.
> >
> >
> >
> > I wanted to ping the list before filing an issue because it seems
> like
> > someone may have thought about this in the past.
> >
> >
> >
> > Thanks.
> >
> >
> >
> > Jonathan Gray
> >
>
> --
> Chris K Wensel
> [EMAIL PROTECTED]
> http://www.cascading.org/
> http://www.scaleunlimited.com/
+
Nathan Marz 2009-02-03, 20:01
+
jason hadoop 2009-02-03, 20:06
+
Nathan Marz 2009-02-03, 21:14
+
Bryan Duxbury 2009-02-04, 06:58
+
Jonathan Gray 2009-02-04, 21:38
+
Jonathan Gray 2009-02-03, 19:07