-Re: Selecting a task for the tasktracker
Yaron Gonen 2012-12-27, 21:18
Thanks a lot!
On Thu, Dec 27, 2012 at 8:11 PM, Vinod Kumar Vavilapalli <
[EMAIL PROTECTED]> wrote:
> On top of that, the message indicates that you need to have your scheduler
> class in the mapred package.
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> On Dec 27, 2012, at 7:38 AM, Hemanth Yamijala wrote:
> Firstly, I am talking about Hadoop 1.0. Please note that in Hadoop 2.x and
> trunk, the Mapreduce framework is completely revamped to Yarn (
> and you may need to look at different interfaces for building your own
> In 1.0, the primary function of the TaskScheduler is the assignTasks
> method. Given a TaskTracker object as input, this method figures out how
> many free map and reduce slots exist in that particular tasktracker and
> selects one or more task that can be scheduled on it. Since task selection
> is the primary responsibility and the granularity is at a task level, the
> class is called TaskScheduler.
> The method of choosing a job and then a task within the job is customised
> by the different schedulers already present in Hadoop. Also, the core logic
> of selecting a map task with data locality optimizations is not implemented
> in the schedulers per se, but they rely on the JobInProgress object in
> MapReduce framework for achieving the same.
> To implement your own Scheduler, it may be best to look at the sources of
> existing schedulers: JobQueueTaskScheduler, CapacityTaskScheduler or
> FairScheduler. In particular, the last two are in the contrib modules of
> mapreduce, and hence will be fairly independent to follow. Their build
> files will also tell you how to resolve any compile problems like the one
> you are facing.
> On Thu, Dec 27, 2012 at 4:10 PM, Yaron Gonen <[EMAIL PROTECTED]>wrote:
>> If I understand correctly, the job scheduler (why is the class called
>> TaskScheduler?) is responsible for assigning the task whose split is as
>> close as possible to the tasktacker.
>> Meaning that the job scheduler is responsible to two things:
>> 1. Selecting a job.
>> 2. Once a job is selected, assign the closest task to the tasktracker
>> that send the heartbeat.
>> Is this correct?
>> I want to write my own job scheduler to change the logic above, but it
>> says The type TaskScheduler is not visible.
>> How can I write my own scheduler?