-Re: Restricting the number of slave nodes used for a given job (regardless of the # of map/reduce tasks involved)
If that is only for benchmarking, you could stop the task-trackers on the
machines you don't want to use.
Or you could setup another cluster.
But yes, there is not standard way to limit the slots taken by a job to a
specified set of machines.
You might be able to do it using a custom Scheduler but that would be out
of your scope, I guess.
On Mon, Sep 10, 2012 at 12:01 PM, Hemanth Yamijala <[EMAIL PROTECTED]>wrote:
> I am not sure if there's any way to restrict the tasks to specific
> machines. However, I think there are some ways of restricting to
> number of 'slots' that can be used by the job.
> Also, not sure which version of Hadoop you are on. The
> has ways by which you can set up a queue with a hard capacity limit.
> The capacity controls the number of slots that that can be used by
> jobs submitted to the queue. So, if you submit a job to the queue,
> irrespective of the number of tasks it has, it should limit it to
> those slots. However, please note that this does not restrict the
> tasks to specific machines.
> On Mon, Sep 10, 2012 at 2:36 PM, Safdar Kureishy
> <[EMAIL PROTECTED]> wrote:
> > Hi,
> > I need to run some benchmarking tests for a given mapreduce job on a
> > *of a 10-node Hadoop cluster. Not that it matters, but the current
> > settings allow for ~20 map slots and 10 reduce slots per node.
> > Without loss of generalization, let's say I want a job with these
> > constraints below:
> > - to use only *5* out of the 10 nodes for running the mappers,
> > - to use only *5* out of the 10 nodes for running the reducers.
> > Is there any other way of achieving this through Hadoop property
> > during job-submission time? I understand that the Fair Scheduler can
> > potentially be used to create pools of a proportionate # of mappers and
> > reducers, to achieve a similar outcome, but the problem is that I still
> > cannot tie such a pool to a fixed # of machines (right?). Essentially,
> > regardless of the # of map/reduce tasks involved, I only want a *fixed #
> > machines* to handle the job.
> > Any tips on how I can go about achieving this?
> > Thanks,
> > Safdar