-Re: Automatically Documenting Apache Hadoop Configuration
Praveen Sripati 2011-12-06, 08:19
> From my work on yarn trying to document the configs there and to
standardize them, writing anything that is going to automatically detect
config values through static analysis is going to be very difficult. This
is because most of the configs in yarn are now built up using static string
All the references to Configuration.get* methods will give the list of
parameters from which the unique ones have to be picked and the literal
string mapped (like dfs.namenode.safemode.threshold-pct for
DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_KEY). We could also add some
annotations to the configuraion parameters, which would be included in the
We can take a crack at it. If the parameters come out accurately then html
can be generated automatically similar to javadocs or else all the newly
added parameters will be written to a file which will be an input to the RM
(or someone else) to open JIRAs and fix them.
> I do not know if we recommend using config strings directly when there's
an API in Job/JobConf supporting setting the same thing.
Changing the parameter through API will lead to building and packaging
multiple times. Also, setting the parameters from the command prompt will
make testing easier.
Ari from Cloudera and the author of the article mentioned in a separate
mail that he would release the code, once it's done I will look into it.
On Tue, Dec 6, 2011 at 12:52 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> I've seen Oozie do that same break-up of config param names and boy, its
> difficult to grep in such a code base when troubleshooting.
> OTOH, we at least get a sane prefix for relevant config names (hope we do?)
> On 06-Dec-2011, at 12:44 AM, Robert Evans wrote:
> > From my work on yarn trying to document the configs there and to
> standardize them, writing anything that is going to automatically detect
> config values through static analysis is going to be very difficult. This
> is because most of the configs in yarn are now built up using static string
> > public static String BASE = "yarn.base.";
> > public static String CONF = BASE+"config";
> > I am not sure that there is a good way around this short of using a full
> java parser to trace out all method calls, and try to resolve the
> parameters. I know this is possible, just not that simple to do.
> > I am +1 for anything that will clean up configs and improve the
> documentation of them. Even if we have to rewire or rewrite a lot of the
> Configuration class to make things work properly.
> > --Bobby Evans
> > On 12/5/11 11:54 AM, "Harsh J" <[EMAIL PROTECTED]> wrote:
> > Praveen,
> > (Inline.)
> > On 05-Dec-2011, at 10:14 PM, Praveen Sripati wrote:
> >> Hi,
> >> Recently there was a query about the Hadoop framework being tolerant for
> >> map/reduce task failure towards the job completion. And the solution
> was to
> >> set the 'mapreduce.map.failures.maxpercent` and
> >> 'mapreduce.reduce.failures.maxpercent' properties. Although this feature
> >> was introduced couple of years back, it was not documented. Had similar
> >> experience with 0.23 release also.
> > I do not know if we recommend using config strings directly when there's
> an API in Job/JobConf supporting setting the same thing. Just saying - that
> there was javadoc already available on this. But of course, it would be
> better if the tutorial covered this too. Doc-patches welcome!
> >> It would be really good for Hadoop adoption to automatically dig and
> >> document all the existing configurable properties in Hadoop and also to
> >> identify newly added properties in a particular release during the build
> >> processes. Documentation would also lead to fewer queries in the forums.
> >> Cloudera has done something similar , though it's not 100% accurate,
> >> would definitely help to some extent.
> > I'm +1 for this. We do request and consistently add entries to
> *-default.xml files if we find them undocumented today. I think we should