|
|
-
Question about Hadoop Default FCFS Job Scheduler
He Chen 2011-01-15, 05:45
Hey all
Why does the FCFS scheduler only let a node chooses one task at a time in one job? In order to increase the data locality, it is reasonable to let a node to choose all its local tasks (if it can) from a job at a time.
Any reply will be appreciated.
Thanks
Chen
-
Re: Question about Hadoop Default FCFS Job Scheduler
Nan Zhu 2011-01-17, 14:28
Hi, Chen
How is it going recently?
Actually I think you misundertand the code in assignTasks() in JobQueueTaskScheduler.java, see the following structure of the interesting codes:
//I'm sorry, I hacked the code so much, the name of the variables may be different from the original version
for (i = 0; i < MapperCapacity; ++i){ ... for (JobInProgress job:jobQueue){ //try to shedule a node-local or rack-local map tasks //here is the interesting place t = job.obtainNewLocalMapTask(...); if (t != null){ ... break;//the break statement here will make the control flow back to "for (job:jobQueue)" which means that it will restart map tasks selection procedure from the first job, so , it is actually schedule all of the first job's local mappers first until the map slots are full } } }
BTW, we can only schedule a reduce task in a single heartbeat
Best, Nan On Sat, Jan 15, 2011 at 1:45 PM, He Chen <[EMAIL PROTECTED]> wrote:
> Hey all > > Why does the FCFS scheduler only let a node chooses one task at a time in > one job? In order to increase the data locality, > it is reasonable to let a node to choose all its local tasks (if it can) > from a job at a time. > > Any reply will be appreciated. > > Thanks > > Chen >
-
Re: Question about Hadoop Default FCFS Job Scheduler
He Chen 2011-01-17, 16:24
Hi Nan,
Thank you for the reply. I understand what you mean. What I concern is inside the "obtainNewLocalMapTask(...)" method, it only assigns one tasks a time.
Now I understand why it only assigns one task at a time. It is because the outside loop:
for (i = 0; i < MapperCapacity; ++i){
(......)
}
I mean why this loop exists here. Why does the scheduler use this type of loop. It imposes overhead to the task assigning process if only assign one task at a time. It is obviously that a node can be assigned all available local tasks it can in one "afford obtainNewLocalMapTask(......)" method call.
Bests
Chen
On Mon, Jan 17, 2011 at 8:28 AM, Nan Zhu <[EMAIL PROTECTED]> wrote:
> Hi, Chen > > How is it going recently? > > Actually I think you misundertand the code in assignTasks() in > JobQueueTaskScheduler.java, see the following structure of the interesting > codes: > > //I'm sorry, I hacked the code so much, the name of the variables may be > different from the original version > > for (i = 0; i < MapperCapacity; ++i){ > ... > for (JobInProgress job:jobQueue){ > //try to shedule a node-local or rack-local map tasks > //here is the interesting place > t = job.obtainNewLocalMapTask(...); > if (t != null){ > ... > break;//the break statement here will make the control flow back > to "for (job:jobQueue)" which means that it will restart map tasks > selection > procedure from the first job, so , it is actually schedule all of the first > job's local mappers first until the map slots are full > } > } > } > > BTW, we can only schedule a reduce task in a single heartbeat > > > > Best, > Nan > On Sat, Jan 15, 2011 at 1:45 PM, He Chen <[EMAIL PROTECTED]> wrote: > > > Hey all > > > > Why does the FCFS scheduler only let a node chooses one task at a time in > > one job? In order to increase the data locality, > > it is reasonable to let a node to choose all its local tasks (if it can) > > from a job at a time. > > > > Any reply will be appreciated. > > > > Thanks > > > > Chen > > >
-
Re: Question about Hadoop Default FCFS Job Scheduler
Nan Zhu 2011-01-17, 16:37
Hi, Chen
Actually not one task each time,
see this statement:
assignedTasks.add(t);
assignedTasks is the return value of this method, and it's a collection of selected tasks, it will contain multiple tasks if the candidates are there..
Best,
Nan
On Tue, Jan 18, 2011 at 12:24 AM, He Chen <[EMAIL PROTECTED]> wrote:
> Hi Nan, > > Thank you for the reply. I understand what you mean. What I concern is > inside the "obtainNewLocalMapTask(...)" method, it only assigns one tasks a > time. > > Now I understand why it only assigns one task at a time. It is because the > outside loop: > > for (i = 0; i < MapperCapacity; ++i){ > > (......) > > } > > I mean why this loop exists here. Why does the scheduler use this type of > loop. It imposes overhead to the task assigning process if only assign one > task at a time. It is obviously that a node can be assigned all available > local tasks it can in one "afford obtainNewLocalMapTask(......)" method > call. > > Bests > > Chen > > On Mon, Jan 17, 2011 at 8:28 AM, Nan Zhu <[EMAIL PROTECTED]> wrote: > > > Hi, Chen > > > > How is it going recently? > > > > Actually I think you misundertand the code in assignTasks() in > > JobQueueTaskScheduler.java, see the following structure of the > interesting > > codes: > > > > //I'm sorry, I hacked the code so much, the name of the variables may be > > different from the original version > > > > for (i = 0; i < MapperCapacity; ++i){ > > ... > > for (JobInProgress job:jobQueue){ > > //try to shedule a node-local or rack-local map tasks > > //here is the interesting place > > t = job.obtainNewLocalMapTask(...); > > if (t != null){ > > ... > > break;//the break statement here will make the control flow back > > to "for (job:jobQueue)" which means that it will restart map tasks > > selection > > procedure from the first job, so , it is actually schedule all of the > first > > job's local mappers first until the map slots are full > > } > > } > > } > > > > BTW, we can only schedule a reduce task in a single heartbeat > > > > > > > > Best, > > Nan > > On Sat, Jan 15, 2011 at 1:45 PM, He Chen <[EMAIL PROTECTED]> wrote: > > > > > Hey all > > > > > > Why does the FCFS scheduler only let a node chooses one task at a time > in > > > one job? In order to increase the data locality, > > > it is reasonable to let a node to choose all its local tasks (if it > can) > > > from a job at a time. > > > > > > Any reply will be appreciated. > > > > > > Thanks > > > > > > Chen > > > > > >
-
Re: Question about Hadoop Default FCFS Job Scheduler
Nan Zhu 2011-01-17, 16:46
OK, I got your point,
you mean why don't we put the for loop into obtainNewLocalMapTask(),
yes, I think we can do that, but the result is the same with current codes, and I don't think it will lead too many benefits on performance, and personally, I like the current style, :-)
Best,
Nan
On Tue, Jan 18, 2011 at 12:24 AM, He Chen <[EMAIL PROTECTED]> wrote:
> Hi Nan, > > Thank you for the reply. I understand what you mean. What I concern is > inside the "obtainNewLocalMapTask(...)" method, it only assigns one tasks a > time. > > Now I understand why it only assigns one task at a time. It is because the > outside loop: > > for (i = 0; i < MapperCapacity; ++i){ > > (......) > > } > > I mean why this loop exists here. Why does the scheduler use this type of > loop. It imposes overhead to the task assigning process if only assign one > task at a time. It is obviously that a node can be assigned all available > local tasks it can in one "afford obtainNewLocalMapTask(......)" method > call. > > Bests > > Chen > > On Mon, Jan 17, 2011 at 8:28 AM, Nan Zhu <[EMAIL PROTECTED]> wrote: > > > Hi, Chen > > > > How is it going recently? > > > > Actually I think you misundertand the code in assignTasks() in > > JobQueueTaskScheduler.java, see the following structure of the > interesting > > codes: > > > > //I'm sorry, I hacked the code so much, the name of the variables may be > > different from the original version > > > > for (i = 0; i < MapperCapacity; ++i){ > > ... > > for (JobInProgress job:jobQueue){ > > //try to shedule a node-local or rack-local map tasks > > //here is the interesting place > > t = job.obtainNewLocalMapTask(...); > > if (t != null){ > > ... > > break;//the break statement here will make the control flow back > > to "for (job:jobQueue)" which means that it will restart map tasks > > selection > > procedure from the first job, so , it is actually schedule all of the > first > > job's local mappers first until the map slots are full > > } > > } > > } > > > > BTW, we can only schedule a reduce task in a single heartbeat > > > > > > > > Best, > > Nan > > On Sat, Jan 15, 2011 at 1:45 PM, He Chen <[EMAIL PROTECTED]> wrote: > > > > > Hey all > > > > > > Why does the FCFS scheduler only let a node chooses one task at a time > in > > > one job? In order to increase the data locality, > > > it is reasonable to let a node to choose all its local tasks (if it > can) > > > from a job at a time. > > > > > > Any reply will be appreciated. > > > > > > Thanks > > > > > > Chen > > > > > >
|
|