Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # dev >> Re: symlink support in Hadoop 2 GA


+
Steve Loughran 2013-09-18, 17:05
Copy link to this message
-
Re: symlink support in Hadoop 2 GA
It's an incompatible change. Existing APIs like listStatus and globStatus
need to be symlink aware now, which can break assumptions of user code.
We've had FileStatus#isSymlink() since the early days, but lots of user
code hasn't been updated to use it.

I think Eli's earlier email did a good job at laying out the current state
and our options. I didn't realize this before, but most of HADOOP-8040 is
already in branch-2.1-beta, but many of the subsequent changes are not
(e.g. HADOOP-9417, HADOOP-9817, HADOOP-9652). This means the current state
of symlink support in branch-2.1-beta is half-baked, which is why "do
nothing" is not a good option.

With that in mind, perhaps Eli's proposals (abbreviated here) make more
sense:

1) Delay 2.2 GA and put in some more effort to fix API issues like
HADOOP-9912 / HADOOP-9972. Undoubtedly, more issues will still fall out of
this post-GA, but we can do our best to fix these issues compatibly in 2.3.
2) Revert symlinks from branch-2.1-beta and leave it all for 2.3, but that
makes 2.3 a pretty big jump from GA. Since symlinks have already appeared
in the 2.1.0 release, it'd also technically make 2.2 a regression from
2.1.0.
3) Wait for 3.0, which I don't think anyone wants.
On Wed, Sep 18, 2013 at 10:05 AM, Steve Loughran <[EMAIL PROTECTED]>wrote:

> the main change is whatever APIs are going to be provided (and implicitly:
> supported for a long time) to handle symlinks separately from directories
>
>
> On 18 September 2013 17:24, Eli Collins <[EMAIL PROTECTED]> wrote:
>
> > On Wed, Sep 18, 2013 at 5:45 AM, Steve Loughran <[EMAIL PROTECTED]
> > >wrote:
> >
> > > On 18 September 2013 12:53, Alejandro Abdelnur <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > On Wed, Sep 18, 2013 at 11:29 AM, Steve Loughran <
> > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > I'm reluctant for this as while delaying the release, because we
> are
> > > > going
> > > > > to find problems all the way up the stack -which will require a
> > > > > choreographed set of changes. Given the grief of the protbuf
> update,
> > I
> > > > > don't want to go near that just before the final release.
> > > > >
> > > >
> > > > Well, I would use the exact same argument used for protobuf (which
> only
> > > > complication was getting protoc 2.5.0 in the jenkins boxes and
> > > communicate
> > > > developers to do the same, other than that we didn't hit any other
> > issue
> > > > AFAIK) ...
> > > >
> > >
> > > protobuf was traumatic at build time, as I recall because it was
> neither
> > > forwards or backwards compatible. Those of us trying to build different
> > > branches had to choose which version to have on the path, or set up
> > scripts
> > > to do the switching. HBase needed rebuilding, so did other things. And
> I
> > > still have the pain of downloading and installing protoc on all Linux
> > VMs I
> > > build up going forward, until apt-get and yum have protoc 2.5
> artifacts.
> > >
> > > This means it was very painful for developer, added a lot of late
> > breaking
> > > pain to the developers, but it had one key feature that gave it an
> edge:
> > it
> > > was immediately obvious where you had a problem as things didn't
> compile
> > or
> > > classload without linkage problems. No latent bugs, unless protobuf 2.5
> > has
> > > them internally -for which we have to rely on google's release testing
> to
> > > have found.
> > >
> > > That is a lot simpler to regression test than adding any new feature to
> > > HDFS and seeing what breaks -as that is something that only surfaces
> out
> > in
> > > the field. Which is why I think it's too late in the 2.1 release
> > timetable
> > > to add symlinks. We've had a 2.1-beta out there, we've got feedback.
> Fix
> > > those problems that are show stoppers, but don't add more stuff. Which
> is
> > > precisely why I have not been pushing in any of my recent changes. I
> may
> > > seem ruthless arguing against symlinks -but I'm not being inconsistent
> > with
> >