-Re: Changing the maximum tasks per node on a per job basis
Harsh J 2013-05-24, 08:31
Yes, you're correct that the end-result is not going to be as static
as you expect it to be. FWIW, per node limit configs have been
discussed before (and even implemented + removed):
On Fri, May 24, 2013 at 1:47 PM, Steve Lewis <[EMAIL PROTECTED]> wrote:
> My reading on Capacity Scheduling is that it controls the number of jobs
> scheduled at the level of the cluster.
> My issue is not sharing at the level of the cluster - usually my job is the
> only one running but rather at the level of
> the individual machine.
> Some of my jobs require more memory and do significant processing -
> especially in the reducer - While the cluster can schedule 8 smaller jobs
> on a node when, say, 8 of the larger ones are scheduled slaves run out of
> swap space and tend to crash.
> It is not unclear that limiting the number of jobs on the cluster will
> stop a scheduler from scheduling the maximum allowed jobs on any node.
> Even requesting multiple slots for a job affects the number of jobs
> running on the cluster but not on any specific node.
> Am I wrong here? If I want, say only three of my jobs running on one node
> does asking for enough slots to guarantee the total jobs is no more than 3
> times the number of nodes guarantee this?
> My read is that the total running jobs might be throttled but not the
> number per node.
> Perhaps a clever use of queues might help but I am not quite sure about
> the details
> On Thu, May 23, 2013 at 4:37 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>> Your problem seems to surround available memory and over-subscription. If
>> you're using a 0.20.x or 1.x version of Apache Hadoop, you probably want to
>> use the CapacityScheduler to address this for you.
>> I once detailed how-to, on a similar question here:
>> On Wed, May 22, 2013 at 2:55 PM, Steve Lewis <[EMAIL PROTECTED]>
>> > I have a series of Hadoop jobs to run - one of my jobs requires larger
>> > standard memory
>> > I allow the task to use 2GB of memory. When I run some of these jobs the
>> > slave nodes are crashing because they run out of swap space. It is not
>> > s slave count not run one. or even 4 of these jobs but 8 stresses the
>> > limits.
>> > I could cut the mapred.tasktracker.reduce.tasks.maximum for the entire
>> > cluster but this cripples the whole cluster for one of many jobs.
>> > It seems to be a very bad design
>> > a) to allow the job tracker to keep assigning tasks to a slave that is
>> > already getting low on memory
>> > b) to allow the user to run jobs capable or crashing noeds on the cluster
>> > c) not to allow the user to specify that some jobs need to be limited to
>> > lower value without requiring this limit for every job.
>> > Are there plans to fix this??
>> > --
>> Harsh J
> Steven M. Lewis PhD
> 4221 105th Ave NE
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Skype lordjoe_com