-Re: modify data distribution in jobconf
Prashant Sharma 2012-01-02, 08:02
I hope it means child jvms which are spawned by tasktrackers. It is still
not clear though what are you trying to achieve, I had say do a little more
You might wanna chk this out.
http://blog.imaginea.com/hadoop-a-short-guide/ ( Take a look at Map-reduce
On Mon, Jan 2, 2012 at 12:56 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> I'm not sure what you are trying to achieve here.
> Hadoop MapReduce works by *trying* to schedule tasks on nodes on which
> data is 'close', either node-local/rack-local.
> We doesn't try to 'start'/'stop' nodes. If that is what you are trying to
> do, you need to look for something else.
> On Dec 31, 2011, at 11:29 PM, mohak gupta wrote:
> > hi
> > as part of my project I need to modify the data distribution layer in job
> > conf so as to achieve the following :
> > 1) control which worker nodes should be started based on the input data
> > given to them.
> > 2) keep other worker nodes in some kind of sleep state.
> > 3) based on the output emitted by the worker nodes and the data
> > allow other worker nodes to start .
> > 4) Perform this in a looping structure till the output is achieved.
> > basically I wish to control which worker nodes perform map and reduce
> > functions based on the data they have recieved.
> > Could you please help me by suggesting if this could be achieved and also
> > what are the tradeoffs involved, Any help is really appreciated
> > regards
> > Mohak Gupta