Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Distributing MapReduce on a computer cluster


Copy link to this message
-
Re: Distributing MapReduce on a computer cluster
For distribution of load you can start reading some chapters from different
types of hadoop scheduler. I have not yet studied other implementation like
hadoop, however a very simplified version of distribution concept  is the
following:

a) Tasktracker ask for work (heartbeat consist of a status of the worker
node - # free slots)
b) Jobtracker pick a job from a list which is sorted based on the specified
policy (fairscheduling, fifo, lifo, other sla)
c) Tasktracker executes the map/reduce job

Like mentioned before there are a lot more details.. In b) there exists an
implementation of delay scheduling which is there to improve throughput by
taking account of input data location for a picked job. There you have a
preemption mechanism that regulate the fairness between pools,etc..

 A good start is book that Preshant mentioned...

On 23 April 2012 23:49, Prashant Kommireddi <[EMAIL PROTECTED]> wrote:

> Shailesh, there's a lot that goes into distributing work across
> tasks/nodes. It's not just distributing work but also fault-tolerance,
> data locality etc that come into play. It might be good to refer
> Hadoop apache docs or Tom White's definitive guide.
>
> Sent from my iPhone
>
> On Apr 23, 2012, at 11:03 AM, Shailesh Samudrala <[EMAIL PROTECTED]>
> wrote:
>
> > Hello,
> >
> > I am trying to design my own MapReduce Implementation and I want to know
> > how hadoop is able to distribute its workload across multiple computers.
> > Can anyone shed more light on this? thanks!
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB