|
|
-
Which config parameters are node-specific?
Zhang, Zhang 2010-01-20, 00:31
Where do I find information about which config parameters can be set as per-node property, and which ones apply to all nodes? For example, I have a cluster consisting of two classes of nodes. One class is dual-core 4GB memory nodes, and the other class is 16-core 128GB memory nodes. It certainly makes sense to configure them differently. So the questions is, which parameters I should pay attention to? I vaguely know that probably at least the following ones can be set as node-specific:
mapred.tasktracker.map.tasks.maximum mapred.tasktracker.reduce.tasks.maximum
But anything beyond that? How about the following ones, can I set them as node-specific parameters?
mapred.child.java.opts tasktracker.http.threads dfs.datanode.handler.count io.sort.factor io.sort.mb mapred.inmem.merge.threshold mapred.job.reduce.input.buffer.percent Thanks!
Zhang
-
Re: Which config parameters are node-specific?
Jeff Zhang 2010-01-20, 02:32
I believe all these parameters can be set as node-specific, because they are in different JVM. Correct me if I am wrong.
On Wed, Jan 20, 2010 at 8:31 AM, Zhang, Zhang <[EMAIL PROTECTED]>wrote:
> > Where do I find information about which config parameters can be set as > per-node property, and which ones apply to all nodes? For example, I have a > cluster consisting of two classes of nodes. One class is dual-core 4GB > memory nodes, and the other class is 16-core 128GB memory nodes. It > certainly makes sense to configure them differently. So the questions is, > which parameters I should pay attention to? I vaguely know that probably at > least the following ones can be set as node-specific: > > mapred.tasktracker.map.tasks.maximum > mapred.tasktracker.reduce.tasks.maximum > > > But anything beyond that? How about the following ones, can I set them as > node-specific parameters? > > mapred.child.java.opts > tasktracker.http.threads > dfs.datanode.handler.count > io.sort.factor > io.sort.mb > mapred.inmem.merge.threshold > mapred.job.reduce.input.buffer.percent > > > Thanks! > > Zhang > > -- Best Regards
Jeff Zhang
-
Re: Which config parameters are node-specific?
Amareshwari Sri Ramadasu 2010-01-20, 03:54
Hi Zhang,
The following parameters are node specific. mapred.tasktracker.map.tasks.maximum mapred.tasktracker.reduce.tasks.maximum tasktracker.http.threads dfs.datanode.handler.count
The rest of the parameters are Job-specific.
Thanks Amareshwari
On 1/20/10 6:01 AM, "Zhang, Zhang" <[EMAIL PROTECTED]> wrote:
Where do I find information about which config parameters can be set as per-node property, and which ones apply to all nodes? For example, I have a cluster consisting of two classes of nodes. One class is dual-core 4GB memory nodes, and the other class is 16-core 128GB memory nodes. It certainly makes sense to configure them differently. So the questions is, which parameters I should pay attention to? I vaguely know that probably at least the following ones can be set as node-specific:
mapred.tasktracker.map.tasks.maximum mapred.tasktracker.reduce.tasks.maximum But anything beyond that? How about the following ones, can I set them as node-specific parameters?
mapred.child.java.opts tasktracker.http.threads dfs.datanode.handler.count io.sort.factor io.sort.mb mapred.inmem.merge.threshold mapred.job.reduce.input.buffer.percent Thanks!
Zhang
-
Re: Which config parameters are node-specific?
Allen Wittenauer 2010-01-20, 20:37
On 1/19/10 7:54 PM, "Amareshwari Sri Ramadasu" <[EMAIL PROTECTED]> wrote:
> Hi Zhang, > > The following parameters are node specific. > mapred.tasktracker.map.tasks.maximum > mapred.tasktracker.reduce.tasks.maximum > tasktracker.http.threads > dfs.datanode.handler.count > > The rest of the parameters are Job-specific.
... Except for the ones that are namenode and jobtracker specific.
:(
Hadoop configuration sucks greatly, and the lack of real documentation on what parameters exist (it seems like every month there is a "new" hidden param) and what actually uses them (i.e., where is final actually taking into account?) doesn't help.
[ and no, *-default.xml and/or "read the source!" is not good enough.]
-
Re: Which config parameters are node-specific?
Edward Capriolo 2010-01-20, 21:12
This is a tricky problem. To add further confusion some variables are used in multiple components.
Mapred.local.dir is used by task and job tracker.hadoop.tmp.dir is the default for everything.
On 1/20/10, Allen Wittenauer <[EMAIL PROTECTED]> wrote: > > > > On 1/19/10 7:54 PM, "Amareshwari Sri Ramadasu" <[EMAIL PROTECTED]> > wrote: > >> Hi Zhang, >> >> The following parameters are node specific. >> mapred.tasktracker.map.tasks.maximum >> mapred.tasktracker.reduce.tasks.maximum >> tasktracker.http.threads >> dfs.datanode.handler.count >> >> The rest of the parameters are Job-specific. > > ... Except for the ones that are namenode and jobtracker specific. > > :( > > Hadoop configuration sucks greatly, and the lack of real documentation on > what parameters exist (it seems like every month there is a "new" hidden > param) and what actually uses them (i.e., where is final actually taking > into account?) doesn't help. > > [ and no, *-default.xml and/or "read the source!" is not good enough.] > >
|
|