Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Restricting the number of slave nodes used for a given job (regardless of the # of map/reduce tasks involved)


+
Safdar Kureishy 2012-09-10, 09:06
+
Hemanth Yamijala 2012-09-10, 10:01
Copy link to this message
-
Re: Restricting the number of slave nodes used for a given job (regardless of the # of map/reduce tasks involved)
If that is only for benchmarking, you could stop the task-trackers on the
machines you don't want to use.
Or you could setup another cluster.

But yes, there is not standard way to limit the slots taken by a job to a
specified set of machines.
You might be able to do it using a custom Scheduler but that would be out
of your scope, I guess.

Regards

Bertrand

On Mon, Sep 10, 2012 at 12:01 PM, Hemanth Yamijala <[EMAIL PROTECTED]>wrote:

> Hi,
>
> I am not sure if there's any way to restrict the tasks to specific
> machines. However, I think there are some ways of restricting to
> number of 'slots' that can be used by the job.
>
> Also, not sure which version of Hadoop you are on. The
> capacityscheduler
> (
> http://hadoop.apache.org/common/docs/r2.0.1-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
> )
> has ways by which you can set up a queue with a hard capacity limit.
> The capacity controls the number of slots that that can be used by
> jobs submitted to the queue. So, if you submit a job to the queue,
> irrespective of the number of tasks it has, it should limit it to
> those slots.  However, please note that this does not restrict the
> tasks to specific machines.
>
> Thanks
> Hemanth
>
> On Mon, Sep 10, 2012 at 2:36 PM, Safdar Kureishy
> <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I need to run some benchmarking tests for a given mapreduce job on a
> *subset
> > *of a 10-node Hadoop cluster. Not that it matters, but the current
> cluster
> > settings allow for ~20 map slots and 10 reduce slots per node.
> >
> > Without loss of generalization, let's say I want a job with these
> > constraints below:
> > - to use only *5* out of the 10 nodes for running the mappers,
> > - to use only *5* out of the 10 nodes for running the reducers.
> >
> > Is there any other way of achieving this through Hadoop property
> overrides
> > during job-submission time? I understand that the Fair Scheduler can
> > potentially be used to create pools of a proportionate # of mappers and
> > reducers, to achieve a similar outcome, but the problem is that I still
> > cannot tie such a pool to a fixed # of machines (right?). Essentially,
> > regardless of the # of map/reduce tasks involved, I only want a *fixed #
> of
> > machines* to handle the job.
> >
> > Any tips on how I can go about achieving this?
> >
> > Thanks,
> > Safdar
>

--
Bertrand Dechoux
+
Safdar Kureishy 2012-09-10, 21:32
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB