Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> modify data distribution in jobconf


Copy link to this message
-
Re: modify data distribution in jobconf
Mohak,

I hope it means child jvms which are spawned by tasktrackers. It is still
not clear though what are you trying to achieve, I had say do a little more
research.

You might wanna chk this out.
http://blog.imaginea.com/hadoop-a-short-guide/ ( Take a look at Map-reduce
part.)

-P
On Mon, Jan 2, 2012 at 12:56 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:

> I'm not sure what you are trying to achieve here.
>
> Hadoop MapReduce works by *trying* to schedule tasks on nodes on which
> data is 'close', either node-local/rack-local.
>
> We doesn't try to 'start'/'stop' nodes. If that is what you are trying to
> do, you need to look for something else.
>
> Arun
>
> On Dec 31, 2011, at 11:29 PM, mohak gupta wrote:
>
> > hi
> >
> > as part of my project I need to modify the data distribution layer in job
> > conf so as to achieve the following :
> >
> > 1) control which worker nodes should be  started based on the input data
> > given to them.
> >
> > 2) keep other worker nodes in some kind of sleep state.
> >
> > 3) based on the output emitted by the worker nodes and the data
> distributed
> > allow other worker nodes to start .
> >
> > 4) Perform this in a looping structure till the output is achieved.
> >
> > basically I wish to control which worker nodes perform map and reduce
> > functions based on the data they have recieved.
> >
> > Could you please help me by suggesting if this could be achieved and also
> > what are the tradeoffs involved, Any help is really appreciated
> >
> > regards
> > Mohak Gupta
>
>