Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Fwd: hi i have a Patch file of my Resouce aware scheduler. How to apply this scheduler ??i place of default scheduler


+
Mahesh Khandewal 2014-04-15, 16:54
Copy link to this message
-
Fwd: hi i have a Patch file of my Resouce aware scheduler. How to apply this scheduler ??i place of default scheduler
From: Mahesh Khandewal <[EMAIL PROTECTED]>
Date: Tue, Apr 15, 2014 at 10:29 PM
Subject: Fwd: hi i have a Patch file of my Resouce aware scheduler. How to
apply this scheduler ??i place of default scheduler
To: [EMAIL PROTECTED]
Hi Harsha please can you help
From: Mahesh Khandewal <[EMAIL PROTECTED]>
Date: Tue, 15 Apr 2014 22:21:26 +0530
Subject: hi i have a Patch file of my Resouce aware scheduler. How to
apply this scheduler ??i place of default scheduler
To: [EMAIL PROTECTED]

Can any one help?? please.

 diff --git a/src/contrib/adaptive-scheduler/README b/src/contrib/adaptive-scheduler/README
new file mode 100644
index 0000000..562781b
+++ b/src/contrib/adaptive-scheduler/README
@@ -0,0 +1,127 @@
+Adaptive Scheduler v1.1
+=======================
+
+Introduction
+------------
+
+The Adaptive Scheduler is a pluggable Hadoop scheduler that
+automatically adjusts the amount of slots depending on the performance
+of jobs and on user-defined high-level completion goals.
+
+MapReduce schedulers use the notion of slots to represent the capacity
+of a cluster. This abstraction is simple and may work for homogeneous
+workloads, but fails to capture the different resource requirements that
+jobs have in multi-user environments. The resource-aware Adaptive
+Scheduler leverages job profiling information to ensure optimal cluster
+utilization and at the same time guarantees high-level completion goals.
+
+The resource-aware Adaptive Scheduler is designed to introduce a more
+fine-grained resource model, provide a more global view of the MapReduce
+cluster, and finally, support high-level completion goals.
+
+ * Resource-awareness: Proactive scheduling based on profiling job
+   information to avoid resource contention. Eliminates fixed number of
+   slots.
+
+ * Global view of the cluster: Find the optimal mix of workloads for the
+   whole cluster instead of trying to find the best task to run in a
+ given node.
+
+ * High-level completion goals: Reactive mechanism to
+   compensate slowdowns. Based on soft-deadlines to prioritize jobs.
+
+Implementation
+--------------
+
+The functionality of the scheduler is split into two levels: job
+matching and task assignment.
+
+ * Job matching:
+   - Find a good mix of jobs for each worker node
+   - Executed periodically and upon job arrival or job completion
+
+ * Task assignment:
+   - Enforces the mix of jobs for a particular node
+   - Executed on every heartbeat
+
+This is actually a well-known problem that has been studied in the past
+in the context of multiprocessors, and can be thought of as a
+bin-packing problem. Bin-packing is a computationally hard problem, and
+it is solved here using heuristics. In particular, the matching
+algorithm used by the resource-aware Adaptive Scheduler is based on a
+previous work for application placement. It has been adapted to
+MapReduce, and is based on a custom utility function.
+
+The job matching algorithm tries to find a good mix of jobs for each
+tasktracker. In order to do so, it goes through all the worker nodes,
+trying different job combinations and after a number of iterations, the
+best solution found so far is chosen. This job matching is executed
+periodically, and it basically provides the plan of what will be
+executed in the cluster until the next matching occurs.
+
+The utility function is used to measure the quality of a job matching
+solution. It provides a value between $-1$ and $1$ for each job on a
+particular solution, and the global utility is simply the sum of these
+values.
+
+ | for machine in machines:
+ |    best_assignment = machine.assignment
+ |    for n in range(0, machine.running_tasks):
+ |       new_assignment = machine.assignment.remove_tasks(n)
+ |       for job in jobqueue:
+ |          if not machine.fit(job):
+ |             next
+ |          new_assignment.add(job)
+ |       if new_assignment.utility > best_assignment.utility:
+ |          best_assignment = new_assignment
+ |    machine.assignment = best_assignment
+ |    matching.add(machine)
+ | return matching
+
+Configuring the Scheduler
+-------------------------
+
+In order to enable the scheduler, the classpath should be updated to
+point to the location of the scheduler (e.g. using conf/hadoop-env.sh),
+and the following properties should be present in the configuration file
+(conf/mapred-site.xml):
+
+ <property>
+   <name>mapred.jobtracker.taskScheduler</name>
+   <value>org.apache.hadoop.mapred.AdaptiveScheduler</value>
+   <description>The class responsible for scheduling the tasks. Set to
+   org.apache.hadoop.mapred.AdaptiveScheduler to enable the
+   resouce-aware Adaptive Scheduler.</description>
+ </property>
+
+ <property>
+   <name>mapred.scheduler.adaptive.interval</name>
+   <value>10000</value>
+   <description>Time between two job matching computations, in
+   milliseconds.</description>
+ </property>
+
+ <property>
+   <name>mapred.scheduler.adaptive.utilization</name>
+   <value>100</value>
+   <description>Percentage of desired node utilization. Used for testing
+   purposes.</description>
+ </property>
+
+Additionally, the following properties are specific for each job, and
+are read when jobs are submitted:
+
+ <property>
+   <name>mapred.job.deadline</name>
+   <value>0</value>
+   <description>Set the job's desired deadline, in
+   seconds.</description>
+ </property>
+
+ <property>
+   <name>mapred.job.profile.{map,reduce}.{cpu,io}</name>
+   <value>100</value>
+   <description>Set the job profiling information for each phase and
+   resource (0-100).</description>
+ </property>
+
diff --git a/src/contrib/adaptive-scheduler/build.xml b/src/contrib/adaptive-scheduler/build.xml
new file mode 100644
index 0000000..c15e232
+++ b/src/contrib/adaptive-scheduler/build.xml
@@ -0,0 +1,28 @@
+<?xml version="1.0"?>
+
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work