The problem with Configuration is that it is public, so changing it does
not just impact Hadoop. It also impacts all of the projects that use it,
either directly as part of the Map/Reduce APIs or for storing their own
configuration. Within Hadoop proper there are several places where it
cannot just be static. For Map Reduce a Configuration object is created
for each Map/Reduce job. So from a client's perspective it may have
multiple different instances of Configuration in flight at any point in
time, one for each job. HDFS also support this having multiple separate
configurations in the client simultaneously.
For some things processes like the NameNode, DataNode and the
ResourceManager you may be able to get away with a single static
configuration, but from the clients perspective that may be difficult. I
am not really sure about the NodeManger, because it interacts with HDFS on
behalf of the end user and I am not completely sure how Configuration fits
into that picture.
On 7/9/12 10:04 AM, "Guillaume Nodet" <[EMAIL PROTECTED]> wrote:
>Right, that would surely be incompatible. The initial work I did was on
>1.0.3 and those problems can be solved in a more simple (though less
>way in that branch, mainly because of the fact that there is a single jar
>which contain everything, so that causes less problems in OSGi.
>For trunk, is there any valid reason to create multiple configurations ?
>is the idea of a singleton something that I can investigate working on ?
> I'm not very familiar with hadoop internals, so I may very well be
>some edge cases. If not, I can come up with a patch that would transform
>Configuration into a singleton, leading to more flexibility for OSGi and a
>performance improvement by avoiding re-parsing the xml configuration
>On Mon, Jul 9, 2012 at 4:37 PM, Robert Evans <[EMAIL PROTECTED]> wrote:
>> I am not super familiar with OSGi. I have used it a little in the past,
>> but that was 5+ years ago. I am in favor of something that will fix the
>> CLASSPATH problems that we currently have and would allow for CLASSPATH
>> isolation between Hadoop itself and the applications that use Hadoop.
>> OSGi can do this cleanly then I am +1 for moving to OSGi.
>> However, we are trying to maintain binary compatibility within major
>> version numbers, in preparation for rolling upgrades. Many of the
>> you have suggested like moving classes from one package to another, and
>> doing some serious rework to Configuration will break not only binary
>> compatibility but also API compatibility.
>> If we do go this rout, just be aware that it is most likely something
>> would have to force a major version bump, which right now means trunk
>> 3.0 line).
>> --Bobby Evans
>> On 7/9/12 8:24 AM, "Guillaume Nodet" <[EMAIL PROTECTED]> wrote:
>> >I'm working with Jean-Baptiste to make hadoop work in OSGi.
>> >OSGi works with classloader in a very specific way which leads to
>> >problems with hadoop.
>> >Let me quickly explain how OSGi works. In OSGi, you deploy bundles,
>> >are jars with additional OSGi metadata. This metadata is used by the
>> >framework to create a classloader for the bundle. However, the
>> >classloaders are not organized in a tree like in a JEE environment, but
>> >rather in some kind of graph, where each classloader has limited
>> >and limited exposure. This is controlled by at the package level by
>> >specifying which packages are exported and which packages are imported
>> >given bundle. This is mainly two consequences:
>> > * OSGi does not supports well split-packages, where the same package
>> >exported by two different bundles
>> > * a classloader does not have visibility on everything as in a usual
>> >classloader environment or even JEE-like env
>> >The first problem arise for example with the org.apache.hadoop.fs