|
|
-
Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.
Pierre ANCELOT 2010-06-30, 09:35
Hi everyone :) There's something I'm probably doing wrong but I can't seem to figure out what. I have two hadoop programs running one after the other. This is done because they don't have the same needs in term of processor in memory, so by separating them I optimize each task better. Fact is, I need for the first job on every node mapred.tasktracker.map.tasks.maximum set to 12. For the second task, I need it to be set to 20. so by default I set it to 12 and in the second job's code, I set this: Configuration hadoopConfiguration = new Configuration(); hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum", 20); But when running the job, instead of having the 20 tasks on each node as expected, I have 12.... Any idea please? Thank you. Pierre. -- http://www.neko-consulting.comEgo sum quis ego servo "Je suis ce que je protège" "I am what I protect"
-
Re: Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.
Amareshwari Sri Ramadasu 2010-06-30, 10:08
Hi Pierre, "mapred.tasktracker.map.tasks.maximum" is a cluster level configuration, cannot be set per job. It is loaded only while bringing up the TaskTracker. Thanks Amareshwari On 6/30/10 3:05 PM, "Pierre ANCELOT" <[EMAIL PROTECTED]> wrote: Hi everyone :) There's something I'm probably doing wrong but I can't seem to figure out what. I have two hadoop programs running one after the other. This is done because they don't have the same needs in term of processor in memory, so by separating them I optimize each task better. Fact is, I need for the first job on every node mapred.tasktracker.map.tasks.maximum set to 12. For the second task, I need it to be set to 20. so by default I set it to 12 and in the second job's code, I set this: Configuration hadoopConfiguration = new Configuration(); hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum", 20); But when running the job, instead of having the 20 tasks on each node as expected, I have 12.... Any idea please? Thank you. Pierre. -- http://www.neko-consulting.comEgo sum quis ego servo "Je suis ce que je protège" "I am what I protect"
-
Re: Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.
Pierre ANCELOT 2010-06-30, 10:28
Hi, Okay, so, if I set the 20 by default, I could maybe limit the number of concurrent maps per node instead? job.setNumReduceTasks exists but I see no equivalent for maps, though I think there was a setNumMapTasks before... Was it removed? Why? Any idea about how to acheive this? Thank you. On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu < [EMAIL PROTECTED]> wrote: > Hi Pierre, > > "mapred.tasktracker.map.tasks.maximum" is a cluster level configuration, > cannot be set per job. It is loaded only while bringing up the TaskTracker. > > Thanks > Amareshwari > > On 6/30/10 3:05 PM, "Pierre ANCELOT" <[EMAIL PROTECTED]> wrote: > > Hi everyone :) > There's something I'm probably doing wrong but I can't seem to figure out > what. > I have two hadoop programs running one after the other. > This is done because they don't have the same needs in term of processor in > memory, so by separating them I optimize each task better. > Fact is, I need for the first job on every node > mapred.tasktracker.map.tasks.maximum set to 12. > For the second task, I need it to be set to 20. > so by default I set it to 12 and in the second job's code, I set this: > > Configuration hadoopConfiguration = new Configuration(); > hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum", > 20); > > But when running the job, instead of having the 20 tasks on each node as > expected, I have 12.... > Any idea please? > > Thank you. > Pierre. > > > -- > http://www.neko-consulting.com> Ego sum quis ego servo > "Je suis ce que je protège" > "I am what I protect" > > -- http://www.neko-consulting.comEgo sum quis ego servo "Je suis ce que je protège" "I am what I protect"
-
Re: Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.
Ted Yu 2010-06-30, 11:57
The number of map tasks is determined by InputSplit. On Wednesday, June 30, 2010, Pierre ANCELOT <[EMAIL PROTECTED]> wrote: > Hi, > Okay, so, if I set the 20 by default, I could maybe limit the number of > concurrent maps per node instead? > job.setNumReduceTasks exists but I see no equivalent for maps, though I > think there was a setNumMapTasks before... > Was it removed? Why? > Any idea about how to acheive this? > > Thank you. > > > On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu < > [EMAIL PROTECTED]> wrote: > >> Hi Pierre, >> >> "mapred.tasktracker.map.tasks.maximum" is a cluster level configuration, >> cannot be set per job. It is loaded only while bringing up the TaskTracker. >> >> Thanks >> Amareshwari >> >> On 6/30/10 3:05 PM, "Pierre ANCELOT" <[EMAIL PROTECTED]> wrote: >> >> Hi everyone :) >> There's something I'm probably doing wrong but I can't seem to figure out >> what. >> I have two hadoop programs running one after the other. >> This is done because they don't have the same needs in term of processor in >> memory, so by separating them I optimize each task better. >> Fact is, I need for the first job on every node >> mapred.tasktracker.map.tasks.maximum set to 12. >> For the second task, I need it to be set to 20. >> so by default I set it to 12 and in the second job's code, I set this: >> >> Configuration hadoopConfiguration = new Configuration(); >> hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum", >> 20); >> >> But when running the job, instead of having the 20 tasks on each node as >> expected, I have 12.... >> Any idea please? >> >> Thank you. >> Pierre. >> >> >> -- >> http://www.neko-consulting.com>> Ego sum quis ego servo >> "Je suis ce que je protège" >> "I am what I protect" >> >> > > > -- > http://www.neko-consulting.com> Ego sum quis ego servo > "Je suis ce que je protège" > "I am what I protect" >
-
Re: Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.
Pierre ANCELOT 2010-06-30, 12:09
Sure, but not the number of tasks running concurrently on a node at the same time. On Wed, Jun 30, 2010 at 1:57 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > The number of map tasks is determined by InputSplit. > > On Wednesday, June 30, 2010, Pierre ANCELOT <[EMAIL PROTECTED]> wrote: > > Hi, > > Okay, so, if I set the 20 by default, I could maybe limit the number of > > concurrent maps per node instead? > > job.setNumReduceTasks exists but I see no equivalent for maps, though I > > think there was a setNumMapTasks before... > > Was it removed? Why? > > Any idea about how to acheive this? > > > > Thank you. > > > > > > On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu < > > [EMAIL PROTECTED]> wrote: > > > >> Hi Pierre, > >> > >> "mapred.tasktracker.map.tasks.maximum" is a cluster level configuration, > >> cannot be set per job. It is loaded only while bringing up the > TaskTracker. > >> > >> Thanks > >> Amareshwari > >> > >> On 6/30/10 3:05 PM, "Pierre ANCELOT" <[EMAIL PROTECTED]> wrote: > >> > >> Hi everyone :) > >> There's something I'm probably doing wrong but I can't seem to figure > out > >> what. > >> I have two hadoop programs running one after the other. > >> This is done because they don't have the same needs in term of processor > in > >> memory, so by separating them I optimize each task better. > >> Fact is, I need for the first job on every node > >> mapred.tasktracker.map.tasks.maximum set to 12. > >> For the second task, I need it to be set to 20. > >> so by default I set it to 12 and in the second job's code, I set this: > >> > >> Configuration hadoopConfiguration = new Configuration(); > >> > hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum", > >> 20); > >> > >> But when running the job, instead of having the 20 tasks on each node as > >> expected, I have 12.... > >> Any idea please? > >> > >> Thank you. > >> Pierre. > >> > >> > >> -- > >> http://www.neko-consulting.com> >> Ego sum quis ego servo > >> "Je suis ce que je protège" > >> "I am what I protect" > >> > >> > > > > > > -- > > http://www.neko-consulting.com> > Ego sum quis ego servo > > "Je suis ce que je protège" > > "I am what I protect" > > > -- http://www.neko-consulting.comEgo sum quis ego servo "Je suis ce que je protège" "I am what I protect"
-
Re: Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.
Yu Li 2010-06-30, 13:56
Hi Pierre, The "setNumReduceTasks" method is for setting the number of reduce tasks to launch, it's equal to set the "mapred.reduce.tasks" parameter, while the "mapred.tasktracker.reduce.tasks.maximum" parameter decides the number of tasks running *concurrently* on one node. And as Amareshwari mentioned, the "mapred.tasktracker.map/reduce.tasks.maximum" is a cluster configuration which could not be set per job. If you set mapred.tasktracker.map.tasks.maximum to 20, and the overall number of map tasks is larger than 20*<nodes number>, there would be 20 map tasks running concurrently on a node. As I know, you probably need to restart the tasktracker if you truely need to change the configuration. Best Regards, Carp 2010/6/30 Pierre ANCELOT <[EMAIL PROTECTED]> > Sure, but not the number of tasks running concurrently on a node at the > same > time. > > > > On Wed, Jun 30, 2010 at 1:57 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > The number of map tasks is determined by InputSplit. > > > > On Wednesday, June 30, 2010, Pierre ANCELOT <[EMAIL PROTECTED]> wrote: > > > Hi, > > > Okay, so, if I set the 20 by default, I could maybe limit the number of > > > concurrent maps per node instead? > > > job.setNumReduceTasks exists but I see no equivalent for maps, though I > > > think there was a setNumMapTasks before... > > > Was it removed? Why? > > > Any idea about how to acheive this? > > > > > > Thank you. > > > > > > > > > On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu < > > > [EMAIL PROTECTED]> wrote: > > > > > >> Hi Pierre, > > >> > > >> "mapred.tasktracker.map.tasks.maximum" is a cluster level > configuration, > > >> cannot be set per job. It is loaded only while bringing up the > > TaskTracker. > > >> > > >> Thanks > > >> Amareshwari > > >> > > >> On 6/30/10 3:05 PM, "Pierre ANCELOT" <[EMAIL PROTECTED]> wrote: > > >> > > >> Hi everyone :) > > >> There's something I'm probably doing wrong but I can't seem to figure > > out > > >> what. > > >> I have two hadoop programs running one after the other. > > >> This is done because they don't have the same needs in term of > processor > > in > > >> memory, so by separating them I optimize each task better. > > >> Fact is, I need for the first job on every node > > >> mapred.tasktracker.map.tasks.maximum set to 12. > > >> For the second task, I need it to be set to 20. > > >> so by default I set it to 12 and in the second job's code, I set this: > > >> > > >> Configuration hadoopConfiguration = new Configuration(); > > >> > > hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum", > > >> 20); > > >> > > >> But when running the job, instead of having the 20 tasks on each node > as > > >> expected, I have 12.... > > >> Any idea please? > > >> > > >> Thank you. > > >> Pierre. > > >> > > >> > > >> -- > > >> http://www.neko-consulting.com> > >> Ego sum quis ego servo > > >> "Je suis ce que je protège" > > >> "I am what I protect" > > >> > > >> > > > > > > > > > -- > > > http://www.neko-consulting.com> > > Ego sum quis ego servo > > > "Je suis ce que je protège" > > > "I am what I protect" > > > > > > > > > -- > http://www.neko-consulting.com> Ego sum quis ego servo > "Je suis ce que je protège" > "I am what I protect" >
-
Re: Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.
Pierre ANCELOT 2010-06-30, 14:06
ok, well, thanks... I truely hoped a solution would exist for this. Thanks. Pierre. On Wed, Jun 30, 2010 at 3:56 PM, Yu Li <[EMAIL PROTECTED]> wrote: > Hi Pierre, > > The "setNumReduceTasks" method is for setting the number of reduce tasks to > launch, it's equal to set the "mapred.reduce.tasks" parameter, while the > "mapred.tasktracker.reduce.tasks.maximum" parameter decides the number of > tasks running *concurrently* on one node. > And as Amareshwari mentioned, the > "mapred.tasktracker.map/reduce.tasks.maximum" is a cluster configuration > which could not be set per job. If you set > mapred.tasktracker.map.tasks.maximum to 20, and the overall number of map > tasks is larger than 20*<nodes number>, there would be 20 map tasks running > concurrently on a node. As I know, you probably need to restart the > tasktracker if you truely need to change the configuration. > > Best Regards, > Carp > > 2010/6/30 Pierre ANCELOT <[EMAIL PROTECTED]> > > > Sure, but not the number of tasks running concurrently on a node at the > > same > > time. > > > > > > > > On Wed, Jun 30, 2010 at 1:57 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > The number of map tasks is determined by InputSplit. > > > > > > On Wednesday, June 30, 2010, Pierre ANCELOT <[EMAIL PROTECTED]> > wrote: > > > > Hi, > > > > Okay, so, if I set the 20 by default, I could maybe limit the number > of > > > > concurrent maps per node instead? > > > > job.setNumReduceTasks exists but I see no equivalent for maps, though > I > > > > think there was a setNumMapTasks before... > > > > Was it removed? Why? > > > > Any idea about how to acheive this? > > > > > > > > Thank you. > > > > > > > > > > > > On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu < > > > > [EMAIL PROTECTED]> wrote: > > > > > > > >> Hi Pierre, > > > >> > > > >> "mapred.tasktracker.map.tasks.maximum" is a cluster level > > configuration, > > > >> cannot be set per job. It is loaded only while bringing up the > > > TaskTracker. > > > >> > > > >> Thanks > > > >> Amareshwari > > > >> > > > >> On 6/30/10 3:05 PM, "Pierre ANCELOT" <[EMAIL PROTECTED]> wrote: > > > >> > > > >> Hi everyone :) > > > >> There's something I'm probably doing wrong but I can't seem to > figure > > > out > > > >> what. > > > >> I have two hadoop programs running one after the other. > > > >> This is done because they don't have the same needs in term of > > processor > > > in > > > >> memory, so by separating them I optimize each task better. > > > >> Fact is, I need for the first job on every node > > > >> mapred.tasktracker.map.tasks.maximum set to 12. > > > >> For the second task, I need it to be set to 20. > > > >> so by default I set it to 12 and in the second job's code, I set > this: > > > >> > > > >> Configuration hadoopConfiguration = new Configuration(); > > > >> > > > hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum", > > > >> 20); > > > >> > > > >> But when running the job, instead of having the 20 tasks on each > node > > as > > > >> expected, I have 12.... > > > >> Any idea please? > > > >> > > > >> Thank you. > > > >> Pierre. > > > >> > > > >> > > > >> -- > > > >> http://www.neko-consulting.com> > > >> Ego sum quis ego servo > > > >> "Je suis ce que je protège" > > > >> "I am what I protect" > > > >> > > > >> > > > > > > > > > > > > -- > > > > http://www.neko-consulting.com> > > > Ego sum quis ego servo > > > > "Je suis ce que je protège" > > > > "I am what I protect" > > > > > > > > > > > > > > > -- > > http://www.neko-consulting.com> > Ego sum quis ego servo > > "Je suis ce que je protège" > > "I am what I protect" > > > -- http://www.neko-consulting.comEgo sum quis ego servo "Je suis ce que je protège" "I am what I protect"
-
Re: Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.
Ken Goodhope 2010-06-30, 15:00
What you want to do can be accomplished in the scheduler. Take a look at the fair scheduler, specifically the user extensible options. There you will find the ability to add some extra logic for deciding if a task can be launched on a per job basis. Could be as simple as deciding a particular job can't launch more than 12 tasks at a time. Capacity scheduler might be able to do this too, but I'm not sure. On Wednesday, June 30, 2010, Pierre ANCELOT <[EMAIL PROTECTED]> wrote: > ok, well, thanks... > I truely hoped a solution would exist for this. > Thanks. > > Pierre. > > On Wed, Jun 30, 2010 at 3:56 PM, Yu Li <[EMAIL PROTECTED]> wrote: > >> Hi Pierre, >> >> The "setNumReduceTasks" method is for setting the number of reduce tasks to >> launch, it's equal to set the "mapred.reduce.tasks" parameter, while the >> "mapred.tasktracker.reduce.tasks.maximum" parameter decides the number of >> tasks running *concurrently* on one node. >> And as Amareshwari mentioned, the >> "mapred.tasktracker.map/reduce.tasks.maximum" is a cluster configuration >> which could not be set per job. If you set >> mapred.tasktracker.map.tasks.maximum to 20, and the overall number of map >> tasks is larger than 20*<nodes number>, there would be 20 map tasks running >> concurrently on a node. As I know, you probably need to restart the >> tasktracker if you truely need to change the configuration. >> >> Best Regards, >> Carp >> >> 2010/6/30 Pierre ANCELOT <[EMAIL PROTECTED]> >> >> > Sure, but not the number of tasks running concurrently on a node at the >> > same >> > time. >> > >> > >> > >> > On Wed, Jun 30, 2010 at 1:57 PM, Ted Yu <[EMAIL PROTECTED]> wrote: >> > >> > > The number of map tasks is determined by InputSplit. >> > > >> > > On Wednesday, June 30, 2010, Pierre ANCELOT <[EMAIL PROTECTED]> >> wrote: >> > > > Hi, >> > > > Okay, so, if I set the 20 by default, I could maybe limit the number >> of >> > > > concurrent maps per node instead? >> > > > job.setNumReduceTasks exists but I see no equivalent for maps, though >> I >> > > > think there was a setNumMapTasks before... >> > > > Was it removed? Why? >> > > > Any idea about how to acheive this? >> > > > >> > > > Thank you. >> > > > >> > > > >> > > > On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu < >> > > > [EMAIL PROTECTED]> wrote: >> > > > >> > > >> Hi Pierre, >> > > >> >> > > >> "mapred.tasktracker.map.tasks.maximum" is a cluster level >> > configuration, >> > > >> cannot be set per job. It is loaded only while bringing up the >> > > TaskTracker. >> > > >> >> > > >> Thanks >> > > >> Amareshwari >> > > >> >> > > >> On 6/30/10 3:05 PM, "Pierre ANCELOT" <[EMAIL PROTECTED]> wrote: >> > > >> >> > > >> Hi everyone :) >> > > >> There's something I'm probably doing wrong but I can't seem to >> figure >> > > out >> > > >> what. >> > > >> I have two hadoop programs running one after the other. >> > > >> This is done because they don't have the same needs in term of >> > processor >> > > in >> > > >> memory, so by separating them I optimize each task better. >> > > >> Fact is, I need for the first job on every node >> > > >> mapred.tasktracker.map.tasks.maximum set to 12. >> > > >> For the second task, I need it to be set to 20. >> > > >> so by default I set it to 12 and in the second job's code, I set >> this: >> > > >> >> > > >> Configuration hadoopConfiguration = new Configuration(); >> > > >> >> > > hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum", >> > > >> 20); >> > > >> >> > > >> But when running the job, instead of having the 20 tasks on each >> node >> > as >> > > >> expected, I have 12.... >> > > >> Any idea please? >> > > >> >> > > >> Thank you. >> > > >> Pierre. >> > > >> >> > -- > http://www.neko-consulting.com> Ego sum quis ego servo > "Je suis ce que je protège" > "I am what I protect" >
-
Re: Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.
Arun C Murthy 2010-06-30, 17:02
CapacityScheduler has a feature called 'High RAM Jobs' where-in you can specify, for a given job, that a single map/reduce task needs more than 1 slot. Thus you could consume all the map/reduce slots on a given TT for a single task of your job. This should suffice. Arun On Jun 30, 2010, at 5:09 AM, Pierre ANCELOT wrote: > Sure, but not the number of tasks running concurrently on a node at > the same > time. > > > > On Wed, Jun 30, 2010 at 1:57 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> The number of map tasks is determined by InputSplit. >> >> On Wednesday, June 30, 2010, Pierre ANCELOT <[EMAIL PROTECTED]> >> wrote: >>> Hi, >>> Okay, so, if I set the 20 by default, I could maybe limit the >>> number of >>> concurrent maps per node instead? >>> job.setNumReduceTasks exists but I see no equivalent for maps, >>> though I >>> think there was a setNumMapTasks before... >>> Was it removed? Why? >>> Any idea about how to acheive this? >>> >>> Thank you. >>> >>> >>> On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu < >>> [EMAIL PROTECTED]> wrote: >>> >>>> Hi Pierre, >>>> >>>> "mapred.tasktracker.map.tasks.maximum" is a cluster level >>>> configuration, >>>> cannot be set per job. It is loaded only while bringing up the >> TaskTracker. >>>> >>>> Thanks >>>> Amareshwari >>>> >>>> On 6/30/10 3:05 PM, "Pierre ANCELOT" <[EMAIL PROTECTED]> wrote: >>>> >>>> Hi everyone :) >>>> There's something I'm probably doing wrong but I can't seem to >>>> figure >> out >>>> what. >>>> I have two hadoop programs running one after the other. >>>> This is done because they don't have the same needs in term of >>>> processor >> in >>>> memory, so by separating them I optimize each task better. >>>> Fact is, I need for the first job on every node >>>> mapred.tasktracker.map.tasks.maximum set to 12. >>>> For the second task, I need it to be set to 20. >>>> so by default I set it to 12 and in the second job's code, I set >>>> this: >>>> >>>> Configuration hadoopConfiguration = new Configuration(); >>>> >> hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum", >>>> 20); >>>> >>>> But when running the job, instead of having the 20 tasks on each >>>> node as >>>> expected, I have 12.... >>>> Any idea please? >>>> >>>> Thank you. >>>> Pierre. >>>> >>>> >>>> -- >>>> http://www.neko-consulting.com>>>> Ego sum quis ego servo >>>> "Je suis ce que je protège" >>>> "I am what I protect" >>>> >>>> >>> >>> >>> -- >>> http://www.neko-consulting.com>>> Ego sum quis ego servo >>> "Je suis ce que je protège" >>> "I am what I protect" >>> >> > > > > -- > http://www.neko-consulting.com> Ego sum quis ego servo > "Je suis ce que je protège" > "I am what I protect"
|
|