Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Control over max map/reduce tasks per job


Copy link to this message
-
RE: Control over max map/reduce tasks per job
I have filed an issue for this:

https://issues.apache.org/jira/browse/HADOOP-5170

JG

> -----Original Message-----
> From: Bryan Duxbury [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, February 03, 2009 10:59 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Control over max map/reduce tasks per job
>
> This sounds good enough for a JIRA ticket to me.
> -Bryan
>
> On Feb 3, 2009, at 11:44 AM, Jonathan Gray wrote:
>
> > Chris,
> >
> > For my specific use cases, it would be best to be able to set N
> > mappers/reducers per job per node (so I can explicitly say, run at
> > most 2 at
> > a time of this CPU bound task on any given node).  However, the
> > other way
> > would work as well (on 10 node system, would set job to max 20
> > tasks at a
> > time globally), but opens up the possibility that a node could be
> > assigned
> > more than 2 of that task.
> >
> > I would work with whatever is easiest to implement as either would
> > be a vast
> > improvement for me (can run high numbers of network latency bound
> > tasks
> > without fear of cpu bound tasks killing the cluster).
> >
> > JG
> >
> >
> >
> >> -----Original Message-----
> >> From: Chris K Wensel [mailto:[EMAIL PROTECTED]]
> >> Sent: Tuesday, February 03, 2009 11:34 AM
> >> To: [EMAIL PROTECTED]
> >> Subject: Re: Control over max map/reduce tasks per job
> >>
> >> Hey Jonathan
> >>
> >> Are you looking to limit the total number of concurrent mapper/
> >> reducers a single job can consume cluster wide, or limit the number
> >> per node?
> >>
> >> That is, you have X mappers/reducers, but only can allow N mappers/
> >> reducers to run at a time globally, for a given job.
> >>
> >> Or, you are cool with all X running concurrently globally, but
> >> want to
> >> guarantee that no node can run more than N tasks from that job?
> >>
> >> Or both?
> >>
> >> just reconciling the conversation we had last week with this thread.
> >>
> >> ckw
> >>
> >> On Feb 3, 2009, at 11:16 AM, Jonathan Gray wrote:
> >>
> >>> All,
> >>>
> >>>
> >>>
> >>> I have a few relatively small clusters (5-20 nodes) and am having
> >>> trouble
> >>> keeping them loaded with my MR jobs.
> >>>
> >>>
> >>>
> >>> The primary issue is that I have different jobs that have
> >>> drastically
> >>> different patterns.  I have jobs that read/write to/from HBase or
> >>> Hadoop
> >>> with minimal logic (network throughput bound or io bound), others
> >> that
> >>> perform crawling (network latency bound), and one huge parsing
> >>> streaming job
> >>> (very CPU bound, each task eats a core).
> >>>
> >>>
> >>>
> >>> I'd like to launch very large numbers of tasks for network latency
> >>> bound
> >>> jobs, however the large CPU bound job means I have to keep the max
> >>> maps
> >>> allowed per node low enough as to not starve the Datanode and
> >>> Regionserver.
> >>>
> >>>
> >>>
> >>> I'm an HBase dev but not familiar enough with Hadoop MR code to
> even
> >>> know
> >>> what would be involved with implementing this.  However, in talking
> >>> with
> >>> other users, it seems like this would be a well-received option.
> >>>
> >>>
> >>>
> >>> I wanted to ping the list before filing an issue because it seems
> >> like
> >>> someone may have thought about this in the past.
> >>>
> >>>
> >>>
> >>> Thanks.
> >>>
> >>>
> >>>
> >>> Jonathan Gray
> >>>
> >>
> >> --
> >> Chris K Wensel
> >> [EMAIL PROTECTED]
> >> http://www.cascading.org/
> >> http://www.scaleunlimited.com/
> >
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB