Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Configuring # task slots


Copy link to this message
-
Configuring # task slots
Was reading up a bit today on configuring the settings for # task slots,
namely:

mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum

Was just wondering:  couldn't (shouldn't?) this be done dynamically by
default?  i.e., couldn't/shouldn't a slave node be able to compute these
values programmatically based on the # of cores in the machine?
(Perhaps in conjunction with a mappers-to-reducers ratio, and a %
over-subscribed ratio.)

Obviously there'd be times where you'd want to manually override that,
but I'd think there could be a simple algorithm for computing this
(e.g., based on the info in slide #8 of this presentation:
http://www.slideshare.net/ydn/hadoop-summit-2010-tuning-hadoop-to-deliver-performance-to-your-application)
that would cover most users' main use case.

Thoughts?  Is there something I'm overlooking here that would make this
unworkable?

Thanks,

DR