|
|
-
Technical question on Capacity Scheduler.
Jagmohan Chauhan 2013-03-03, 08:11
Hi I am going through the Capacity Scheduler implementation. There is one thing i did not understand clearly. 1. Does the o ff-switch task refers to a task in which data has to be fetched over the network. It means its not node-local ? 2. Does off-switch task includes only the tasks for which map input has to be fetched from a node on a different rack across the switch or it also includes task where data has to be fetched from another node on same rack on same switch? -- Thanks and Regards Jagmohan Chauhan MSc student,CS Univ. of Saskatchewan IEEE Graduate Student Member http://homepage.usask.ca/~jac735/
+
Jagmohan Chauhan 2013-03-03, 08:11
-
Re: Technical question on Capacity Scheduler.
Harsh J 2013-03-03, 10:41
On Sun, Mar 3, 2013 at 1:41 PM, Jagmohan Chauhan <[EMAIL PROTECTED] > wrote: > Hi > > I am going through the Capacity Scheduler implementation. There is one > thing i did not understand clearly. > Are you reading the YARN CapacityScheduler or the older, MRv1 one? I'd suggest reading the newer one for any implementation or research goals, for it to be more current and future-applicable. > 1. Does the o ff-switch task refers to a task in which data has to be > fetched over the network. It means its not node-local ? > Off-switch would imply off-rack, i.e. not node local, nor rack-local. > 2. Does off-switch task includes only the tasks for which map input has to > be fetched from a node on a different rack across the switch or it also > includes task where data has to be fetched from another node on same rack > on same switch? > A task's input split is generally supposed to define all locations of available inputs. If the CS is unable to schedule to any of those locations, nor their racks, then it schedules an off-rack (see above) task which has to pull the input from a different rack. > > -- > Thanks and Regards > Jagmohan Chauhan > MSc student,CS > Univ. of Saskatchewan > IEEE Graduate Student Member > > http://homepage.usask.ca/~jac735/> Feel free to post any further impl. related questions! :) -- Harsh J
+
Harsh J 2013-03-03, 10:41
-
Re: Technical question on Capacity Scheduler.
Jagmohan Chauhan 2013-03-04, 01:47
Thanks Harsh. I have a few more questions. Q1: I found it in my experiments using CS that for any user , its next job does not start until its current one is finished. Is it true and are there any exceptions and if true then why is it so? I I did not find any such condition in the implementation of CS. Q2: The concept of reserved slots is true only if speculative execution is on. Am i correct ? If yes,then the code dealing with reserved slots wont be executed if speculative execution is off? PS: I am working on MRv1. On Sun, Mar 3, 2013 at 2:41 AM, Harsh J <[EMAIL PROTECTED]> wrote: > On Sun, Mar 3, 2013 at 1:41 PM, Jagmohan Chauhan < > [EMAIL PROTECTED] > > wrote: > > > Hi > > > > I am going through the Capacity Scheduler implementation. There is one > > thing i did not understand clearly. > > > > Are you reading the YARN CapacityScheduler or the older, MRv1 one? I'd > suggest reading the newer one for any implementation or research goals, for > it to be more current and future-applicable. > > > > 1. Does the o ff-switch task refers to a task in which data has to be > > fetched over the network. It means its not node-local ? > > > > Off-switch would imply off-rack, i.e. not node local, nor rack-local. > > > > 2. Does off-switch task includes only the tasks for which map input has > to > > be fetched from a node on a different rack across the switch or it also > > includes task where data has to be fetched from another node on same rack > > on same switch? > > > > A task's input split is generally supposed to define all locations of > available inputs. If the CS is unable to schedule to any of those > locations, nor their racks, then it schedules an off-rack (see above) task > which has to pull the input from a different rack. > > > > > > -- > > Thanks and Regards > > Jagmohan Chauhan > > MSc student,CS > > Univ. of Saskatchewan > > IEEE Graduate Student Member > > > > http://homepage.usask.ca/~jac735/> > > > Feel free to post any further impl. related questions! :) > > -- > Harsh J > -- Thanks and Regards Jagmohan Chauhan MSc student,CS Univ. of Saskatchewan IEEE Graduate Student Member http://homepage.usask.ca/~jac735/
+
Jagmohan Chauhan 2013-03-04, 01:47
-
Re: Technical question on Capacity Scheduler.
Jagmohan Chauhan 2013-03-06, 03:33
Hi All Can someone please reply to my queries? On Sun, Mar 3, 2013 at 5:47 PM, Jagmohan Chauhan <[EMAIL PROTECTED] > wrote: > Thanks Harsh. > > I have a few more questions. > > Q1: I found it in my experiments using CS that for any user , its next job > does not start until its current one is finished. Is it true and are there > any exceptions and if true then why is it so? I I did not find any such > condition in the implementation of CS. > > Q2: The concept of reserved slots is true only if speculative execution > is on. Am i correct ? If yes,then the code dealing with reserved slots wont > be executed if speculative execution is off? > > PS: I am working on MRv1. > > > On Sun, Mar 3, 2013 at 2:41 AM, Harsh J <[EMAIL PROTECTED]> wrote: > >> On Sun, Mar 3, 2013 at 1:41 PM, Jagmohan Chauhan < >> [EMAIL PROTECTED] >> > wrote: >> >> > Hi >> > >> > I am going through the Capacity Scheduler implementation. There is one >> > thing i did not understand clearly. >> > >> >> Are you reading the YARN CapacityScheduler or the older, MRv1 one? I'd >> suggest reading the newer one for any implementation or research goals, >> for >> it to be more current and future-applicable. >> >> >> > 1. Does the o ff-switch task refers to a task in which data has to be >> > fetched over the network. It means its not node-local ? >> > >> >> Off-switch would imply off-rack, i.e. not node local, nor rack-local. >> >> >> > 2. Does off-switch task includes only the tasks for which map input >> has to >> > be fetched from a node on a different rack across the switch or it also >> > includes task where data has to be fetched from another node on same >> rack >> > on same switch? >> > >> >> A task's input split is generally supposed to define all locations of >> available inputs. If the CS is unable to schedule to any of those >> locations, nor their racks, then it schedules an off-rack (see above) task >> which has to pull the input from a different rack. >> >> >> > >> > -- >> > Thanks and Regards >> > Jagmohan Chauhan >> > MSc student,CS >> > Univ. of Saskatchewan >> > IEEE Graduate Student Member >> > >> > http://homepage.usask.ca/~jac735/>> > >> >> Feel free to post any further impl. related questions! :) >> >> -- >> Harsh J >> > > > > -- > Thanks and Regards > Jagmohan Chauhan > MSc student,CS > Univ. of Saskatchewan > IEEE Graduate Student Member > > http://homepage.usask.ca/~jac735/> -- Thanks and Regards Jagmohan Chauhan MSc student,CS Univ. of Saskatchewan IEEE Graduate Student Member http://homepage.usask.ca/~jac735/
+
Jagmohan Chauhan 2013-03-06, 03:33
-
Re: Technical question on Capacity Scheduler.
Harsh J 2013-03-06, 04:55
The CS does support running jobs in parallel. Are you observing just the UI or are also noticing a FIFO behavior in logs where assignments can be seen with timestamps? On Wed, Mar 6, 2013 at 9:03 AM, Jagmohan Chauhan <[EMAIL PROTECTED]> wrote: > Hi All > > Can someone please reply to my queries? > > On Sun, Mar 3, 2013 at 5:47 PM, Jagmohan Chauhan <[EMAIL PROTECTED] >> wrote: > >> Thanks Harsh. >> >> I have a few more questions. >> >> Q1: I found it in my experiments using CS that for any user , its next job >> does not start until its current one is finished. Is it true and are there >> any exceptions and if true then why is it so? I I did not find any such >> condition in the implementation of CS. >> >> Q2: The concept of reserved slots is true only if speculative execution >> is on. Am i correct ? If yes,then the code dealing with reserved slots wont >> be executed if speculative execution is off? >> >> PS: I am working on MRv1. >> >> >> On Sun, Mar 3, 2013 at 2:41 AM, Harsh J <[EMAIL PROTECTED]> wrote: >> >>> On Sun, Mar 3, 2013 at 1:41 PM, Jagmohan Chauhan < >>> [EMAIL PROTECTED] >>> > wrote: >>> >>> > Hi >>> > >>> > I am going through the Capacity Scheduler implementation. There is one >>> > thing i did not understand clearly. >>> > >>> >>> Are you reading the YARN CapacityScheduler or the older, MRv1 one? I'd >>> suggest reading the newer one for any implementation or research goals, >>> for >>> it to be more current and future-applicable. >>> >>> >>> > 1. Does the o ff-switch task refers to a task in which data has to be >>> > fetched over the network. It means its not node-local ? >>> > >>> >>> Off-switch would imply off-rack, i.e. not node local, nor rack-local. >>> >>> >>> > 2. Does off-switch task includes only the tasks for which map input >>> has to >>> > be fetched from a node on a different rack across the switch or it also >>> > includes task where data has to be fetched from another node on same >>> rack >>> > on same switch? >>> > >>> >>> A task's input split is generally supposed to define all locations of >>> available inputs. If the CS is unable to schedule to any of those >>> locations, nor their racks, then it schedules an off-rack (see above) task >>> which has to pull the input from a different rack. >>> >>> >>> > >>> > -- >>> > Thanks and Regards >>> > Jagmohan Chauhan >>> > MSc student,CS >>> > Univ. of Saskatchewan >>> > IEEE Graduate Student Member >>> > >>> > http://homepage.usask.ca/~jac735/>>> > >>> >>> Feel free to post any further impl. related questions! :) >>> >>> -- >>> Harsh J >>> >> >> >> >> -- >> Thanks and Regards >> Jagmohan Chauhan >> MSc student,CS >> Univ. of Saskatchewan >> IEEE Graduate Student Member >> >> http://homepage.usask.ca/~jac735/>> > > > > -- > Thanks and Regards > Jagmohan Chauhan > MSc student,CS > Univ. of Saskatchewan > IEEE Graduate Student Member > > http://homepage.usask.ca/~jac735/-- Harsh J
+
Harsh J 2013-03-06, 04:55
-
Re: Technical question on Capacity Scheduler.
Jagmohan Chauhan 2013-03-06, 05:02
I think i need to re-frame the question a bit. Yes CS supports multiple jobs in parallel but of different users. If i am submitting 3 different jobs for a same user at the same time, they are always executed in a FIFO manner. Am i correct? On Tue, Mar 5, 2013 at 8:55 PM, Harsh J <[EMAIL PROTECTED]> wrote: > The CS does support running jobs in parallel. Are you observing just > the UI or are also noticing a FIFO behavior in logs where assignments > can be seen with timestamps? > > On Wed, Mar 6, 2013 at 9:03 AM, Jagmohan Chauhan > <simplefunduare [EMAIL PROTECTED] <[EMAIL PROTECTED]>> wrote: > > Hi All > > > > Can someone please reply to my queries? > > > > On Sun, Mar 3, 2013 at 5:47 PM, Jagmohan Chauhan < > [EMAIL PROTECTED] > >> wrote: > > > >> Thanks Harsh. > >> > >> I have a few more questions. > >> > >> Q1: I found it in my experiments using CS that for any user , its next > job > >> does not start until its current one is finished. Is it true and are > there > >> any exceptions and if true then why is it so? I I did not find any such > >> condition in the implementation of CS. > >> > >> Q2: The concept of reserved slots is true only if speculative execution > >> is on. Am i correct ? If yes,then the code dealing with reserved slots > wont > >> be executed if speculative execution is off? > >> > >> PS: I am working on MRv1. > >> > >> > >> On Sun, Mar 3, 2013 at 2:41 AM, Harsh J <[EMAIL PROTECTED]> wrote: > >> > >>> On Sun, Mar 3, 2013 at 1:41 PM, Jagmohan Chauhan < > >>> [EMAIL PROTECTED] > >>> > wrote: > >>> > >>> > Hi > >>> > > >>> > I am going through the Capacity Scheduler implementation. There is > one > >>> > thing i did not understand clearly. > >>> > > >>> > >>> Are you reading the YARN CapacityScheduler or the older, MRv1 one? I'd > >>> suggest reading the newer one for any implementation or research goals, > >>> for > >>> it to be more current and future-applicable. > >>> > >>> > >>> > 1. Does the o ff-switch task refers to a task in which data has to be > >>> > fetched over the network. It means its not node-local ? > >>> > > >>> > >>> Off-switch would imply off-rack, i.e. not node local, nor rack-local. > >>> > >>> > >>> > 2. Does off-switch task includes only the tasks for which map input > >>> has to > >>> > be fetched from a node on a different rack across the switch or it > also > >>> > includes task where data has to be fetched from another node on same > >>> rack > >>> > on same switch? > >>> > > >>> > >>> A task's input split is generally supposed to define all locations of > >>> available inputs. If the CS is unable to schedule to any of those > >>> locations, nor their racks, then it schedules an off-rack (see above) > task > >>> which has to pull the input from a different rack. > >>> > >>> > >>> > > >>> > -- > >>> > Thanks and Regards > >>> > Jagmohan Chauhan > >>> > MSc student,CS > >>> > Univ. of Saskatchewan > >>> > IEEE Graduate Student Member > >>> > > >>> > http://homepage.usask.ca/~jac735/> >>> > > >>> > >>> Feel free to post any further impl. related questions! :) > >>> > >>> -- > >>> Harsh J > >>> > >> > >> > >> > >> -- > >> Thanks and Regards > >> Jagmohan Chauhan > >> MSc student,CS > >> Univ. of Saskatchewan > >> IEEE Graduate Student Member > >> > >> http://homepage.usask.ca/~jac735/> >> > > > > > > > > -- > > Thanks and Regards > > Jagmohan Chauhan > > MSc student,CS > > Univ. of Saskatchewan > > IEEE Graduate Student Member > > > > http://homepage.usask.ca/~jac735/> > > > -- > Harsh J > -- Thanks and Regards Jagmohan Chauhan MSc student,CS Univ. of Saskatchewan IEEE Graduate Student Member http://homepage.usask.ca/~jac735/
+
Jagmohan Chauhan 2013-03-06, 05:02
-
Re: Technical question on Capacity Scheduler.
Jagmohan Chauhan 2013-03-06, 05:02
I am observing the logs. On Tue, Mar 5, 2013 at 9:02 PM, Jagmohan Chauhan <[EMAIL PROTECTED] > wrote: > I think i need to re-frame the question a bit. Yes CS supports multiple > jobs in parallel but of different users. If i am submitting 3 different > jobs for a same user at the same time, they are always executed in a FIFO > manner. > Am i correct? > > On Tue, Mar 5, 2013 at 8:55 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> The CS does support running jobs in parallel. Are you observing just >> the UI or are also noticing a FIFO behavior in logs where assignments >> can be seen with timestamps? >> >> On Wed, Mar 6, 2013 at 9:03 AM, Jagmohan Chauhan >> <simplefunduare [EMAIL PROTECTED] <[EMAIL PROTECTED]>> wrote: >> > Hi All >> > >> > Can someone please reply to my queries? >> > >> > On Sun, Mar 3, 2013 at 5:47 PM, Jagmohan Chauhan < >> [EMAIL PROTECTED] >> >> wrote: >> > >> >> Thanks Harsh. >> >> >> >> I have a few more questions. >> >> >> >> Q1: I found it in my experiments using CS that for any user , its next >> job >> >> does not start until its current one is finished. Is it true and are >> there >> >> any exceptions and if true then why is it so? I I did not find any >> such >> >> condition in the implementation of CS. >> >> >> >> Q2: The concept of reserved slots is true only if speculative >> execution >> >> is on. Am i correct ? If yes,then the code dealing with reserved slots >> wont >> >> be executed if speculative execution is off? >> >> >> >> PS: I am working on MRv1. >> >> >> >> >> >> On Sun, Mar 3, 2013 at 2:41 AM, Harsh J <[EMAIL PROTECTED]> wrote: >> >> >> >>> On Sun, Mar 3, 2013 at 1:41 PM, Jagmohan Chauhan < >> >>> [EMAIL PROTECTED] >> >>> > wrote: >> >>> >> >>> > Hi >> >>> > >> >>> > I am going through the Capacity Scheduler implementation. There is >> one >> >>> > thing i did not understand clearly. >> >>> > >> >>> >> >>> Are you reading the YARN CapacityScheduler or the older, MRv1 one? I'd >> >>> suggest reading the newer one for any implementation or research >> goals, >> >>> for >> >>> it to be more current and future-applicable. >> >>> >> >>> >> >>> > 1. Does the o ff-switch task refers to a task in which data has to >> be >> >>> > fetched over the network. It means its not node-local ? >> >>> > >> >>> >> >>> Off-switch would imply off-rack, i.e. not node local, nor rack-local. >> >>> >> >>> >> >>> > 2. Does off-switch task includes only the tasks for which map input >> >>> has to >> >>> > be fetched from a node on a different rack across the switch or it >> also >> >>> > includes task where data has to be fetched from another node on same >> >>> rack >> >>> > on same switch? >> >>> > >> >>> >> >>> A task's input split is generally supposed to define all locations of >> >>> available inputs. If the CS is unable to schedule to any of those >> >>> locations, nor their racks, then it schedules an off-rack (see above) >> task >> >>> which has to pull the input from a different rack. >> >>> >> >>> >> >>> > >> >>> > -- >> >>> > Thanks and Regards >> >>> > Jagmohan Chauhan >> >>> > MSc student,CS >> >>> > Univ. of Saskatchewan >> >>> > IEEE Graduate Student Member >> >>> > >> >>> > http://homepage.usask.ca/~jac735/>> >>> > >> >>> >> >>> Feel free to post any further impl. related questions! :) >> >>> >> >>> -- >> >>> Harsh J >> >>> >> >> >> >> >> >> >> >> -- >> >> Thanks and Regards >> >> Jagmohan Chauhan >> >> MSc student,CS >> >> Univ. of Saskatchewan >> >> IEEE Graduate Student Member >> >> >> >> http://homepage.usask.ca/~jac735/>> >> >> > >> > >> > >> > -- >> > Thanks and Regards >> > Jagmohan Chauhan >> > MSc student,CS >> > Univ. of Saskatchewan >> > IEEE Graduate Student Member >> > >> > http://homepage.usask.ca/~jac735/>> >> >> >> -- >> Harsh J >> > > > > -- > Thanks and Regards > Jagmohan Chauhan > MSc student,CS > Univ. of Saskatchewan > IEEE Graduate Student Member > > http://homepage.usask.ca/~jac735/> -- Thanks and Regards Jagmohan Chauhan MSc student,CS Univ. of Saskatchewan IEEE Graduate Student Member http://homepage.usask.ca/~jac735/
+
Jagmohan Chauhan 2013-03-06, 05:02
-
Re: Technical question on Capacity Scheduler.
Arun C Murthy 2013-03-06, 22:17
That is true *only* if you do not have enough slots to run tasks for other jobs of the same user in the entire cluster. On Mar 5, 2013, at 9:02 PM, Jagmohan Chauhan wrote: > I think i need to re-frame the question a bit. Yes CS supports multiple > jobs in parallel but of different users. If i am submitting 3 different > jobs for a same user at the same time, they are always executed in a FIFO > manner. > Am i correct? > > On Tue, Mar 5, 2013 at 8:55 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> The CS does support running jobs in parallel. Are you observing just >> the UI or are also noticing a FIFO behavior in logs where assignments >> can be seen with timestamps? >> >> On Wed, Mar 6, 2013 at 9:03 AM, Jagmohan Chauhan >> <simplefunduare [EMAIL PROTECTED] <[EMAIL PROTECTED]>> wrote: >>> Hi All >>> >>> Can someone please reply to my queries? >>> >>> On Sun, Mar 3, 2013 at 5:47 PM, Jagmohan Chauhan < >> [EMAIL PROTECTED] >>>> wrote: >>> >>>> Thanks Harsh. >>>> >>>> I have a few more questions. >>>> >>>> Q1: I found it in my experiments using CS that for any user , its next >> job >>>> does not start until its current one is finished. Is it true and are >> there >>>> any exceptions and if true then why is it so? I I did not find any such >>>> condition in the implementation of CS. >>>> >>>> Q2: The concept of reserved slots is true only if speculative execution >>>> is on. Am i correct ? If yes,then the code dealing with reserved slots >> wont >>>> be executed if speculative execution is off? >>>> >>>> PS: I am working on MRv1. >>>> >>>> >>>> On Sun, Mar 3, 2013 at 2:41 AM, Harsh J <[EMAIL PROTECTED]> wrote: >>>> >>>>> On Sun, Mar 3, 2013 at 1:41 PM, Jagmohan Chauhan < >>>>> [EMAIL PROTECTED] >>>>>> wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> I am going through the Capacity Scheduler implementation. There is >> one >>>>>> thing i did not understand clearly. >>>>>> >>>>> >>>>> Are you reading the YARN CapacityScheduler or the older, MRv1 one? I'd >>>>> suggest reading the newer one for any implementation or research goals, >>>>> for >>>>> it to be more current and future-applicable. >>>>> >>>>> >>>>>> 1. Does the o ff-switch task refers to a task in which data has to be >>>>>> fetched over the network. It means its not node-local ? >>>>>> >>>>> >>>>> Off-switch would imply off-rack, i.e. not node local, nor rack-local. >>>>> >>>>> >>>>>> 2. Does off-switch task includes only the tasks for which map input >>>>> has to >>>>>> be fetched from a node on a different rack across the switch or it >> also >>>>>> includes task where data has to be fetched from another node on same >>>>> rack >>>>>> on same switch? >>>>>> >>>>> >>>>> A task's input split is generally supposed to define all locations of >>>>> available inputs. If the CS is unable to schedule to any of those >>>>> locations, nor their racks, then it schedules an off-rack (see above) >> task >>>>> which has to pull the input from a different rack. >>>>> >>>>> >>>>>> >>>>>> -- >>>>>> Thanks and Regards >>>>>> Jagmohan Chauhan >>>>>> MSc student,CS >>>>>> Univ. of Saskatchewan >>>>>> IEEE Graduate Student Member >>>>>> >>>>>> http://homepage.usask.ca/~jac735/>>>>>> >>>>> >>>>> Feel free to post any further impl. related questions! :) >>>>> >>>>> -- >>>>> Harsh J >>>>> >>>> >>>> >>>> >>>> -- >>>> Thanks and Regards >>>> Jagmohan Chauhan >>>> MSc student,CS >>>> Univ. of Saskatchewan >>>> IEEE Graduate Student Member >>>> >>>> http://homepage.usask.ca/~jac735/>>>> >>> >>> >>> >>> -- >>> Thanks and Regards >>> Jagmohan Chauhan >>> MSc student,CS >>> Univ. of Saskatchewan >>> IEEE Graduate Student Member >>> >>> http://homepage.usask.ca/~jac735/>> >> >> >> -- >> Harsh J >> > > > > -- > Thanks and Regards > Jagmohan Chauhan > MSc student,CS > Univ. of Saskatchewan > IEEE Graduate Student Member > > http://homepage.usask.ca/~jac735/-- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
+
Arun C Murthy 2013-03-06, 22:17
|
|