Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> symlink support in Hadoop 2 GA


Copy link to this message
-
Re: symlink support in Hadoop 2 GA
I encourage interested parties to read through HADOOP-9912 to get a feel
for the issues. There really is no way to add symlink support without
changing the behavior of existing APIs. Ultimately, anything that returns a
FileStatus is going to be different. Even if we default to resolving
symlinks, resolving can lead to FileNotFound or permission errors. Thus, we
have to choose whether to prune the bad links, show the bad links as
dangling, or throwing an exception. None of these options are compatible.

I'm really concerned about putting this in a minor release like 2.3 since
it has the potential to break a lot of user code. HADOOP-9912 is an example
from within our own ecosystem, but think of all the custom user code out
there written against FileSystem. 2.2 GA is basically our last chance to
make this kind of change before Hadoop 3.

Thanks,
Andrew
On Tue, Sep 17, 2013 at 9:10 AM, Colin McCabe <[EMAIL PROTECTED]>wrote:

> The issue is not modifying existing APIs.  The issue is that code has
> been written that makes assumptions that are incompatible with the
> existence of things that are not files or directories.  For example,
> there is a lot of code out there that looks at FileStatus#isFile, and
> if it returns false, assumes that what it is looking at is a
> directory.  In the case of a symlink, this assumption is incorrect.
>
> Faced with this, we have considered making the default behavior of
> listStatus and globStatus to be fully resolving symlinks, and simply
> not listing dangling symlinks. Code which is prepared to deal symlinks
> can use newer versions of the listStatus and globStatus functions
> which do return symlinks as symlinks.
>
> We might consider defaulting FileSystem#listStatus and
> FileSystem#globStatus to "fully resolving symlinks by default" and
> defaulting FileContext#listStatus and FileContext#Util#globStatus to
> the opposite.  This seems like the maximally compatible solution that
> we're going to get.  I think this makes sense.
>
> The alternative is kicking the can down the road to Hadoop 3, and
> letting vendors of alternative (including some proprietary
> alternative) systems continue to claim that "Hadoop doesn't support
> symlinks yet" (with some justice).
>
> P.S.  I would be fine with putting this in 2.2 or 2.3 if that seems
> more appropriate.
>
> sincerely,
> Colin
>
> On Tue, Sep 17, 2013 at 8:23 AM, Suresh Srinivas <[EMAIL PROTECTED]>
> wrote:
> > I agree that this is an important change. However, 2.2.0 GA is getting
> > ready to rollout in weeks. I am concerned that these changes will add not
> > only incompatible changes late in the game, but also possibly
> instability.
> > Java API incompatibility is some thing we have avoided for the most part
> > and I am concerned that this is adding such incompatibility in FileSystem
> > APIs. We should find work arounds by adding possibly newer APIs and
> leaving
> > existing APIs as is. If this can be done, my vote is to enable this
> feature
> > in 2.3. Even if it cannot be done, I am concerned that this is coming
> quite
> > late and we should see if could allow some incompatible changes into 2.3
> > for this feature.
> >
> >
> > On Mon, Sep 16, 2013 at 6:49 PM, Andrew Wang <[EMAIL PROTECTED]
> >wrote:
> >
> >> Hi all,
> >>
> >> I wanted to broadcast plans for putting the FileSystem symlinks work
> >> (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I
> think
> >> it's pretty important we get it in since it's not a compatible change;
> if
> >> it misses the GA train, we're not going to have symlinks until the next
> >> major release.
> >>
> >> However, we're still dealing with ongoing issues revealed via testing.
> >> There's user-code out there that only handles files and directories and
> >> will barf when given a symlink (perhaps a dangling one!). See
> HADOOP-9912
> >> for a nice example where globStatus returning symlinks broke Pig; some
> of
> >> us had a conference call to talk it through, and one definite conclusion
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB