Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Restricting the number of slave nodes used for a given job (regardless of the # of map/reduce tasks involved)


+
Safdar Kureishy 2012-09-10, 09:06
Copy link to this message
-
Re: Restricting the number of slave nodes used for a given job (regardless of the # of map/reduce tasks involved)
Hi,

I am not sure if there's any way to restrict the tasks to specific
machines. However, I think there are some ways of restricting to
number of 'slots' that can be used by the job.

Also, not sure which version of Hadoop you are on. The
capacityscheduler
(http://hadoop.apache.org/common/docs/r2.0.1-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html)
has ways by which you can set up a queue with a hard capacity limit.
The capacity controls the number of slots that that can be used by
jobs submitted to the queue. So, if you submit a job to the queue,
irrespective of the number of tasks it has, it should limit it to
those slots.  However, please note that this does not restrict the
tasks to specific machines.

Thanks
Hemanth

On Mon, Sep 10, 2012 at 2:36 PM, Safdar Kureishy
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> I need to run some benchmarking tests for a given mapreduce job on a *subset
> *of a 10-node Hadoop cluster. Not that it matters, but the current cluster
> settings allow for ~20 map slots and 10 reduce slots per node.
>
> Without loss of generalization, let's say I want a job with these
> constraints below:
> - to use only *5* out of the 10 nodes for running the mappers,
> - to use only *5* out of the 10 nodes for running the reducers.
>
> Is there any other way of achieving this through Hadoop property overrides
> during job-submission time? I understand that the Fair Scheduler can
> potentially be used to create pools of a proportionate # of mappers and
> reducers, to achieve a similar outcome, but the problem is that I still
> cannot tie such a pool to a fixed # of machines (right?). Essentially,
> regardless of the # of map/reduce tasks involved, I only want a *fixed # of
> machines* to handle the job.
>
> Any tips on how I can go about achieving this?
>
> Thanks,
> Safdar
+
Bertrand Dechoux 2012-09-10, 11:18
+
Safdar Kureishy 2012-09-10, 21:32
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB