So, I have two jobs, Job A and Job B. For Job A, I would like to have a
maximum of 6 mappers per node. However, Job B is a little different. For
Job B, I can only run one mapper per node. The reason for this isn't
important -- let's just say this requirement is non-negotiable. I would
like to tell Hadoop, "For Job A, schedule a maximum of 6 mappers per node.
But for Job B, schedule a maximum of 1 mapper per node." Is this possible
The only solution I can think of is :
1) Have two folders off the main hadoop folder, conf.JobA and conf.JobB.
Each folder has its own copy of mapred-site.xml.
conf.JobA/mapred-site.xml has a value of 6
for mapred.tasktracker.map.tasks.maximum. conf.JobB/mapred-site.xml has a
value of 1 for mapred.tasktracker.map.tasks.maximum.
2) Before I run Job A :
2a) Shut down my tasktrackers
2b) Copy conf.JobA/mapred-site.xml into Hadoop's conf folder, replacing the
mapred-site.xml that was already in there
2c) Restart my tasktrackers
2d) Wait for the tasktrackers to finish starting
3) Run Job A
and then do a similar thing when I need to run Job B.
I really don't like this solution; it seems kludgey and failure-prone. Is
there a better way to do what I need to do?