Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Configuring # task slots


Copy link to this message
-
Configuring # task slots
Was reading up a bit today on configuring the settings for # task slots,
namely:

mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum

Was just wondering:  couldn't (shouldn't?) this be done dynamically by
default?  i.e., couldn't/shouldn't a slave node be able to compute these
values programmatically based on the # of cores in the machine?
(Perhaps in conjunction with a mappers-to-reducers ratio, and a %
over-subscribed ratio.)

Obviously there'd be times where you'd want to manually override that,
but I'd think there could be a simple algorithm for computing this
(e.g., based on the info in slide #8 of this presentation:
http://www.slideshare.net/ydn/hadoop-summit-2010-tuning-hadoop-to-deliver-performance-to-your-application)
that would cover most users' main use case.

Thoughts?  Is there something I'm overlooking here that would make this
unworkable?

Thanks,

DR
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB