Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> symlink support in Hadoop 2 GA

Copy link to this message
Re: symlink support in Hadoop 2 GA
Colin posted a summary of our phone call yesterday (attendees: myself,
Colin, Daryn, Nathan, Jason, Chris, Suresh, Sanjay) on HADOOP-9984:


Pasted here:
   - We discussed alternatives to
   but concluded that they weren't workable.
   - We agreed that doing the symlink resolution in each Filesystem
   subclass is what we ought to do in 9984, in order to keep compatibility
   with out-of-tree filesystems.
   - We agreed to disable symlink resolution in Hadoop 2 GA. We will spend
   a few weeks ironing out all the bugs and enable it in Hadoop 2.3. However,
   we would like to make all backwards-incompatible API changes prior to
   Hadoop 2 GA.
   - We agreed that
HADOOP-9972<https://issues.apache.org/jira/browse/HADOOP-9972> (new
   symlink-aware API for globStatus) should get into Hadoop 2 GA.
   - We discussed the issue of returning resolved paths versus unresolved
   paths, but were unable to come to any conclusion. Everyone agreed that
   there would be serious performance problems if we returned unresolved
   paths, but some claimed that programs would break when encountering
   resolved paths.
There's also a new umbrella issue at HADOOP-10019 tracking on-going
symlinks changes.

On Thu, Oct 3, 2013 at 2:08 PM, Daryn Sharp <[EMAIL PROTECTED]> wrote:

> I reluctantly agree that we should disable symlinks in 2.2 until we can
> sort out the compatibility issues.  I'm reluctant in the sense that its a
> feature users have long wanted, and it's something we'd like to use from an
> administrative view.  However I don't see all the issues being shorted out
> in the very near future.
> I filed some jiras today that have led me to believe that the current
> implementation of fs symlinks is irreparably flawed.  Adding optional
> primitives to filesystems to make them symlink capable is ok.  However,
> adding symlink resolution to individual filesystems is fundamentally
> broken.  It doesn't work for stacked filesystems (viewfs, chroots, filters,
> etc) because the resolution must occur at the highest level, not within an
> individual filesystem itself.  Otherwise the abstraction of the top-level
> filesystem is violated and all kinds of unexpected behavior like walking
> out of chroots becomes possible.
> Daryn
> On Oct 3, 2013, at 1:39 PM, sanjay Radia wrote:
> > There are a number of issues (some minor, some more than minor).
> > GA is close and we are are still in discussion on the some of them;
> while I believe we will close on these very very shortly, code change like
> this so close to GA is dangerous.
> >
> > I suggest we do the following:
> > 1) Disable Symlinks  in 2.2 GA- throw unsupported exception on
> createSymlink in both FileSystem and FileContext.
> > 2) Deal with the  isDir() in 2.2GA in preparation for item 3 coming
> after GA:
> >       a) Deprecate isDir()
> >        b) Add a new API that returns an enum (see FileContext).
> > 3) Fix Symlinks, in a future release, hopefully the very next one after
> 2.2GA
> >   a)  change the stack to use the new API replacing isDir().
> >   b) fix isDIr() to do something smarter (we can detail this later but
> there is a solution that has been discussed). This helps customer
> applications that call isDir().
> >  c) Remove isDir in a future release when customers have had sufficient
> time to migrate.
> >
> > sanjay
> >
> > PS. J Rottinghuis expressed a similar sentiment in a previous email in
> this thread:
> >
> >
> >
> > On Sep 18, 2013, at 5:11 PM, J. Rottinghuis wrote:
> >
> >> I like symlink functionality, but in our migration to Hadoop 2.x this
> is a
> >> total distraction. If the APIs stay in 2.2 GA we'll have to choose to:
> >> a) Not uprev until symlink support is figured out up and down the stack,
> >> and we've been able to migrate all our 1.x (equivalent) clusters to 2.x