Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> How to change the number of mappers per node for a given job


Copy link to this message
-
How to change the number of mappers per node for a given job
Hello all,

So, I have two jobs, Job A and Job B.  For Job A, I would like to have a
maximum of 6 mappers per node.  However, Job B is a little different.  For
Job B, I can only run one mapper per node.  The reason for this isn't
important -- let's just say this requirement is non-negotiable.  I would
like to tell Hadoop, "For Job A, schedule a maximum of 6 mappers per node.
 But for Job B, schedule a maximum of 1 mapper per node."  Is this possible
at all?

The only solution I can think of is :

1) Have two folders off the main hadoop folder, conf.JobA and conf.JobB.
 Each folder has its own copy of mapred-site.xml.
 conf.JobA/mapred-site.xml has a value of 6
for mapred.tasktracker.map.tasks.maximum.  conf.JobB/mapred-site.xml has a
value of 1 for mapred.tasktracker.map.tasks.maximum.
2) Before I run Job A :
2a) Shut down my tasktrackers
2b) Copy conf.JobA/mapred-site.xml into Hadoop's conf folder, replacing the
mapred-site.xml that was already in there
2c) Restart my tasktrackers
2d) Wait for the tasktrackers to finish starting
3) Run Job A

and then do a similar thing when I need to run Job B.

I really don't like this solution; it seems kludgey and failure-prone.  Is
there a better way to do what I need to do?

--Jeremy
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB