Hadoop, mail # user - Need help about task slots

Shashidhar Rao 2013-05-11, 14:32
Re: Need help about task slots
Rahul Bhattacharjee 2013-05-11, 15:27

I am also new to Hadoop world , here is my take on your question , if there
is something missing then others would surely correct that.

For per-YARN , the slots are fixed and computed based on the crunching
capacity of the datanode hardware , once the slots per data node is
ascertained , they are divided into Map and reducer slots and that goes
into the config files and remain fixed , until changed.In YARN , its
decided at runtime based on the kind of requirement of particular task.Its
very much possible that a datanode at certain point of time running  10
tasks and another similar datanode is only running 4 tasks.

Coming to your question. Based of the data set size , block size of dfs and
input formater , the number of map tasks are decided , generally for file
based inputformats its one mapper per data block , however there are way to
change this using configuration settings.Reduce tasks are set using job

General rule as I have read from various documents is that Mappers should
run atleast a minute , so you can run a sample to find out a good size of
data block which would make you mapper run more than a minute. Now it again
depends on your SLA , in case you are not looking for a very small SLA you
can choose to run less mappers at the expense of higher runtime.

But again its all theory , not sure how these things are handled in actual
prod clusters.


On Sat, May 11, 2013 at 8:02 PM, Shashidhar Rao

> Hi Users,
> I am new to Hadoop and confused about task slots in a cluster. How would I
> know how many task slots would be required for a job. Is there any
> empirical formula or on what basis should I set the number of task slots.
> Advanced Thanks
