Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Restricting the number of slave nodes used for a given job (regardless of the # of map/reduce tasks involved)


Copy link to this message
-
Re: Restricting the number of slave nodes used for a given job (regardless of the # of map/reduce tasks involved)
Hemanth Yamijala 2012-09-10, 10:01
Hi,

I am not sure if there's any way to restrict the tasks to specific
machines. However, I think there are some ways of restricting to
number of 'slots' that can be used by the job.

Also, not sure which version of Hadoop you are on. The
capacityscheduler
(http://hadoop.apache.org/common/docs/r2.0.1-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html)
has ways by which you can set up a queue with a hard capacity limit.
The capacity controls the number of slots that that can be used by
jobs submitted to the queue. So, if you submit a job to the queue,
irrespective of the number of tasks it has, it should limit it to
those slots.  However, please note that this does not restrict the
tasks to specific machines.

Thanks
Hemanth

On Mon, Sep 10, 2012 at 2:36 PM, Safdar Kureishy
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> I need to run some benchmarking tests for a given mapreduce job on a *subset
> *of a 10-node Hadoop cluster. Not that it matters, but the current cluster
> settings allow for ~20 map slots and 10 reduce slots per node.
>
> Without loss of generalization, let's say I want a job with these
> constraints below:
> - to use only *5* out of the 10 nodes for running the mappers,
> - to use only *5* out of the 10 nodes for running the reducers.
>
> Is there any other way of achieving this through Hadoop property overrides
> during job-submission time? I understand that the Fair Scheduler can
> potentially be used to create pools of a proportionate # of mappers and
> reducers, to achieve a similar outcome, but the problem is that I still
> cannot tie such a pool to a fixed # of machines (right?). Essentially,
> regardless of the # of map/reduce tasks involved, I only want a *fixed # of
> machines* to handle the job.
>
> Any tips on how I can go about achieving this?
>
> Thanks,
> Safdar