|
|
-
Re: modify data distribution in jobconfPrashant Sharma 2012-01-02, 08:02
Mohak,
I hope it means child jvms which are spawned by tasktrackers. It is still not clear though what are you trying to achieve, I had say do a little more research. You might wanna chk this out. http://blog.imaginea.com/hadoop-a-short-guide/ ( Take a look at Map-reduce part.) -P On Mon, Jan 2, 2012 at 12:56 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > I'm not sure what you are trying to achieve here. > > Hadoop MapReduce works by *trying* to schedule tasks on nodes on which > data is 'close', either node-local/rack-local. > > We doesn't try to 'start'/'stop' nodes. If that is what you are trying to > do, you need to look for something else. > > Arun > > On Dec 31, 2011, at 11:29 PM, mohak gupta wrote: > > > hi > > > > as part of my project I need to modify the data distribution layer in job > > conf so as to achieve the following : > > > > 1) control which worker nodes should be started based on the input data > > given to them. > > > > 2) keep other worker nodes in some kind of sleep state. > > > > 3) based on the output emitted by the worker nodes and the data > distributed > > allow other worker nodes to start . > > > > 4) Perform this in a looping structure till the output is achieved. > > > > basically I wish to control which worker nodes perform map and reduce > > functions based on the data they have recieved. > > > > Could you please help me by suggesting if this could be achieved and also > > what are the tradeoffs involved, Any help is really appreciated > > > > regards > > Mohak Gupta > > |