Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Need help about task slots


+
Shashidhar Rao 2013-05-11, 14:32
Copy link to this message
-
Re: Need help about task slots
Hi,

I am also new to Hadoop world , here is my take on your question , if there
is something missing then others would surely correct that.

For per-YARN , the slots are fixed and computed based on the crunching
capacity of the datanode hardware , once the slots per data node is
ascertained , they are divided into Map and reducer slots and that goes
into the config files and remain fixed , until changed.In YARN , its
decided at runtime based on the kind of requirement of particular task.Its
very much possible that a datanode at certain point of time running  10
tasks and another similar datanode is only running 4 tasks.

Coming to your question. Based of the data set size , block size of dfs and
input formater , the number of map tasks are decided , generally for file
based inputformats its one mapper per data block , however there are way to
change this using configuration settings.Reduce tasks are set using job
configuration.

General rule as I have read from various documents is that Mappers should
run atleast a minute , so you can run a sample to find out a good size of
data block which would make you mapper run more than a minute. Now it again
depends on your SLA , in case you are not looking for a very small SLA you
can choose to run less mappers at the expense of higher runtime.

But again its all theory , not sure how these things are handled in actual
prod clusters.

HTH,

Thanks,
Rahul
On Sat, May 11, 2013 at 8:02 PM, Shashidhar Rao
<[EMAIL PROTECTED]>wrote:

> Hi Users,
>
> I am new to Hadoop and confused about task slots in a cluster. How would I
> know how many task slots would be required for a job. Is there any
> empirical formula or on what basis should I set the number of task slots.
>
> Advanced Thanks
>
+
Mohammad Tariq 2013-05-12, 12:11
+
yypvsxf19870706 2013-05-12, 12:29
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB