Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - Hadoop Configuration Files


Copy link to this message
-
Re: Hadoop Configuration Files
Steve Loughran 2011-09-26, 13:32
On 24/09/11 15:48, Harsh J wrote:
> There are specific derivatives of Configuration class that each read
> certain *-site.xml files. This is because the XML files are service
> specific.
>

I'm confused now.

My belief is that when a default configuration file is pushed to the
list via Configuration.addDefaultResource(), then all Configuration
instances that are created after that get the config, whether they are
Configuration instances or subclasses thereof.

For example, JobConf explicitly adds the MR files

   static{
     Configuration.addDefaultResource("mapred-default.xml");
     Configuration.addDefaultResource("mapred-site.xml");
   }
If the resource hasn't been loaded already, that loading triggers a
reload of all existing configurations with the loadResource flag set

/* in org.apache.hadoop.conf.Configuration */

   public static synchronized void addDefaultResource(String name) {
     if(!defaultResources.contains(name)) {
       defaultResources.add(name);
       for(Configuration conf : REGISTRY.keySet()) {
         if(conf.loadDefaults) {
           conf.reloadConfiguration();
         }
       }
     }
   }

Configuration.loadResource  is true unless you construct an instance
with new Configuration(false); the state propagates when you create a
new Configuration instance off another.

The way the constructor adds all Configuration instances to the static
(weak ref)  REGISTRY map is inefficient as the loadDefaults flag is only
ever set in the ASF codebase at construction time; it would be better to
make that flag static and only register instances with loadDefaults = true

Now, for some extra fun, Configuration.reloadConfiguration() is not
final. Which allows subclasses to do it, before even their static
construction/initialisation is fully complete. I know this as I have
done it, and would not recommend it to anyone. You can end up in that
weird world of class initialisation time stack traces.

To clean up Configuration, then, I would
  -make reloadConfiguration final
  -make loadDefaults static
  -only add confs to the keySet if loadDefaults = true
  -add some debug strings to see whats going on/wrong.

This would break my code, but that's OK. What I did was not something
I'd recommend to anyone else, and that class of mine is now marked as
@Deprecated in my own codebase, as it was more trouble than it was
worth. What was it trying to do? Get a live config from a Configuration
Management service, and retain that bonding to the CM infrastructure
even when cloned. This stops working once you start
serializing/deserializing them, so it's not worth the hassle.