I am not super familiar with OSGi. I have used it a little in the past,
but that was 5+ years ago. I am in favor of something that will fix the
CLASSPATH problems that we currently have and would allow for CLASSPATH
isolation between Hadoop itself and the applications that use Hadoop. If
OSGi can do this cleanly then I am +1 for moving to OSGi.
However, we are trying to maintain binary compatibility within major
version numbers, in preparation for rolling upgrades. Many of the things
you have suggested like moving classes from one package to another, and
doing some serious rework to Configuration will break not only binary
compatibility but also API compatibility.
If we do go this rout, just be aware that it is most likely something that
would have to force a major version bump, which right now means trunk (the
On 7/9/12 8:24 AM, "Guillaume Nodet" <[EMAIL PROTECTED]> wrote:
>I'm working with Jean-Baptiste to make hadoop work in OSGi.
>OSGi works with classloader in a very specific way which leads to several
>problems with hadoop.
>Let me quickly explain how OSGi works. In OSGi, you deploy bundles, which
>are jars with additional OSGi metadata. This metadata is used by the OSGi
>framework to create a classloader for the bundle. However, the
>classloaders are not organized in a tree like in a JEE environment, but
>rather in some kind of graph, where each classloader has limited
>and limited exposure. This is controlled by at the package level by
>specifying which packages are exported and which packages are imported by
>given bundle. This is mainly two consequences:
> * OSGi does not supports well split-packages, where the same package is
>exported by two different bundles
> * a classloader does not have visibility on everything as in a usual
>classloader environment or even JEE-like env
>The first problem arise for example with the org.apache.hadoop.fs package
>which is split across hadoop-common and hadoop-hdfs jars (which defines
>Hdfs class). There may be other cases, but I haven't hit them yet. To
>solve this problem, it'd be better if such classes were moved into a
>The second problem is much more complicated. I think most of the
>classloading is done from Configuration. However, Configuration has an
>internal classloader which is set by the constructor to the thread context
>classloader (defaulting to the Configuration class' classloader) and new
>Configuration objects are created everywhere in the code.
>In addition, creating new Configuration objects force the parsing of the
>configuration files several times.
>Also in OSGi, Configuration is better done through the standard OSGi
>ConfigurationAdmin service, so it would be nice to integrate the
>configuration into ConfigAdmin when running in OSGi.
>For the above reasons, I'd like to know what would you think of
>transforming the Configuration object into a real singleton, or at least
>replacing the "new Configuration()" call spread everywhere with the access
>to a singleton Configuration.getInstance().
>This would allow the hadoop osgi layer to manage the Configuration in a
>more osgi friendly way, allowing the use of a specific subclass which
>better manage the class loading in an OSGi environment and integrate with
>ConfigAdmin. This may also remove the need for keeping a registry of
>existing Configuration and having to update them when a default resource
>added for example.
>Some of the above problems have been addressed in some way in HADOOP-7977,
>but the fixes I've been working on were more related to hadoop 1.0.x
>branch, and are slightly unapplicable to trunk.
>One last point: the two above problems are mainly due to the fact that
>been assuming that individual hadoop jars are transformed into native
>bundles. This would go away if we'd have a single bundle containing all
>the individual jars (as it was with hadoop-core-1.0.x, but having more