Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Fwd: hi i have a Patch file of my Resouce aware scheduler. How to apply this scheduler ??i place of default scheduler


Copy link to this message
-
Fwd: hi i have a Patch file of my Resouce aware scheduler. How to apply this scheduler ??i place of default scheduler
From: Mahesh Khandewal <[EMAIL PROTECTED]>
Date: Tue, Apr 15, 2014 at 10:29 PM
Subject: Fwd: hi i have a Patch file of my Resouce aware scheduler. How to
apply this scheduler ??i place of default scheduler
To: [EMAIL PROTECTED]
Hi Harsha please can you help
From: Mahesh Khandewal <[EMAIL PROTECTED]>
Date: Tue, 15 Apr 2014 22:21:26 +0530
Subject: hi i have a Patch file of my Resouce aware scheduler. How to
apply this scheduler ??i place of default scheduler
To: [EMAIL PROTECTED]

Can any one help?? please.

 diff --git a/src/contrib/adaptive-scheduler/README b/src/contrib/adaptive-scheduler/README
new file mode 100644
index 0000000..562781b
+++ b/src/contrib/adaptive-scheduler/README
@@ -0,0 +1,127 @@
+Adaptive Scheduler v1.1
+=======================
+
+Introduction
+------------
+
+The Adaptive Scheduler is a pluggable Hadoop scheduler that
+automatically adjusts the amount of slots depending on the performance
+of jobs and on user-defined high-level completion goals.
+
+MapReduce schedulers use the notion of slots to represent the capacity
+of a cluster. This abstraction is simple and may work for homogeneous
+workloads, but fails to capture the different resource requirements that
+jobs have in multi-user environments. The resource-aware Adaptive
+Scheduler leverages job profiling information to ensure optimal cluster
+utilization and at the same time guarantees high-level completion goals.
+
+The resource-aware Adaptive Scheduler is designed to introduce a more
+fine-grained resource model, provide a more global view of the MapReduce
+cluster, and finally, support high-level completion goals.
+
+ * Resource-awareness: Proactive scheduling based on profiling job
+   information to avoid resource contention. Eliminates fixed number of
+   slots.
+
+ * Global view of the cluster: Find the optimal mix of workloads for the
+   whole cluster instead of trying to find the best task to run in a
+ given node.
+
+ * High-level completion goals: Reactive mechanism to
+   compensate slowdowns. Based on soft-deadlines to prioritize jobs.
+
+Implementation
+--------------
+
+The functionality of the scheduler is split into two levels: job
+matching and task assignment.
+
+ * Job matching:
+   - Find a good mix of jobs for each worker node
+   - Executed periodically and upon job arrival or job completion
+
+ * Task assignment:
+   - Enforces the mix of jobs for a particular node
+   - Executed on every heartbeat
+
+This is actually a well-known problem that has been studied in the past
+in the context of multiprocessors, and can be thought of as a
+bin-packing problem. Bin-packing is a computationally hard problem, and
+it is solved here using heuristics. In particular, the matching
+algorithm used by the resource-aware Adaptive Scheduler is based on a
+previous work for application placement. It has been adapted to
+MapReduce, and is based on a custom utility function.
+
+The job matching algorithm tries to find a good mix of jobs for each
+tasktracker. In order to do so, it goes through all the worker nodes,
+trying different job combinations and after a number of iterations, the
+best solution found so far is chosen. This job matching is executed
+periodically, and it basically provides the plan of what will be
+executed in the cluster until the next matching occurs.
+
+The utility function is used to measure the quality of a job matching
+solution. It provides a value between $-1$ and $1$ for each job on a
+particular solution, and the global utility is simply the sum of these
+values.
+
+ | for machine in machines:
+ |    best_assignment = machine.assignment
+ |    for n in range(0, machine.running_tasks):
+ |       new_assignment = machine.assignment.remove_tasks(n)
+ |       for job in jobqueue:
+ |          if not machine.fit(job):
+ |             next
+ |          new_assignment.add(job)
+ |       if new_assignment.utility > best_assignment.utility:
+ |          best_assignment = new_assignment
+ |    machine.assignment = best_assignment
+ |    matching.add(machine)
+ | return matching
+
+Configuring the Scheduler
+-------------------------
+
+In order to enable the scheduler, the classpath should be updated to
+point to the location of the scheduler (e.g. using conf/hadoop-env.sh),
+and the following properties should be present in the configuration file
+(conf/mapred-site.xml):
+
+ <property>
+   <name>mapred.jobtracker.taskScheduler</name>
+   <value>org.apache.hadoop.mapred.AdaptiveScheduler</value>
+   <description>The class responsible for scheduling the tasks. Set to
+   org.apache.hadoop.mapred.AdaptiveScheduler to enable the
+   resouce-aware Adaptive Scheduler.</description>
+ </property>
+
+ <property>
+   <name>mapred.scheduler.adaptive.interval</name>
+   <value>10000</value>
+   <description>Time between two job matching computations, in
+   milliseconds.</description>
+ </property>
+
+ <property>
+   <name>mapred.scheduler.adaptive.utilization</name>
+   <value>100</value>
+   <description>Percentage of desired node utilization. Used for testing
+   purposes.</description>
+ </property>
+
+Additionally, the following properties are specific for each job, and
+are read when jobs are submitted:
+
+ <property>
+   <name>mapred.job.deadline</name>
+   <value>0</value>
+   <description>Set the job's desired deadline, in
+   seconds.</description>
+ </property>
+
+ <property>
+   <name>mapred.job.profile.{map,reduce}.{cpu,io}</name>
+   <value>100</value>
+   <description>Set the job profiling information for each phase and
+   resource (0-100).</description>
+ </property>
+
diff --git a/src/contrib/adaptive-scheduler/build.xml b/src/contrib/adaptive-scheduler/build.xml
new file mode 100644
index 0000000..c15e232
+++ b/src/contrib/adaptive-scheduler/build.xml
@@ -0,0 +1,28 @@
+<?xml version="1.0"?>
+
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB