Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: Control over max map/reduce tasks per job

Chris K Wensel 2009-02-03, 19:33
Jonathan Gray 2009-02-03, 19:44
Nathan Marz 2009-02-03, 20:01
jason hadoop 2009-02-03, 20:06
Nathan Marz 2009-02-03, 21:14
Bryan Duxbury 2009-02-04, 06:58
Jonathan Gray 2009-02-04, 21:38
Copy link to this message
Control over max map/reduce tasks per job


I have a few relatively small clusters (5-20 nodes) and am having trouble
keeping them loaded with my MR jobs.


The primary issue is that I have different jobs that have drastically
different patterns.  I have jobs that read/write to/from HBase or Hadoop
with minimal logic (network throughput bound or io bound), others that
perform crawling (network latency bound), and one huge parsing streaming job
(very CPU bound, each task eats a core).


I'd like to launch very large numbers of tasks for network latency bound
jobs, however the large CPU bound job means I have to keep the max maps
allowed per node low enough as to not starve the Datanode and Regionserver.


I'm an HBase dev but not familiar enough with Hadoop MR code to even know
what would be involved with implementing this.  However, in talking with
other users, it seems like this would be a well-received option.


I wanted to ping the list before filing an issue because it seems like
someone may have thought about this in the past.




Jonathan Gray