Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # dev >> Re: symlink support in Hadoop 2 GA


+
Steve Loughran 2013-09-18, 17:05
Copy link to this message
-
Re: symlink support in Hadoop 2 GA
It's an incompatible change. Existing APIs like listStatus and globStatus
need to be symlink aware now, which can break assumptions of user code.
We've had FileStatus#isSymlink() since the early days, but lots of user
code hasn't been updated to use it.

I think Eli's earlier email did a good job at laying out the current state
and our options. I didn't realize this before, but most of HADOOP-8040 is
already in branch-2.1-beta, but many of the subsequent changes are not
(e.g. HADOOP-9417, HADOOP-9817, HADOOP-9652). This means the current state
of symlink support in branch-2.1-beta is half-baked, which is why "do
nothing" is not a good option.

With that in mind, perhaps Eli's proposals (abbreviated here) make more
sense:

1) Delay 2.2 GA and put in some more effort to fix API issues like
HADOOP-9912 / HADOOP-9972. Undoubtedly, more issues will still fall out of
this post-GA, but we can do our best to fix these issues compatibly in 2.3.
2) Revert symlinks from branch-2.1-beta and leave it all for 2.3, but that
makes 2.3 a pretty big jump from GA. Since symlinks have already appeared
in the 2.1.0 release, it'd also technically make 2.2 a regression from
2.1.0.
3) Wait for 3.0, which I don't think anyone wants.
On Wed, Sep 18, 2013 at 10:05 AM, Steve Loughran <[EMAIL PROTECTED]>wrote:

> the main change is whatever APIs are going to be provided (and implicitly:
> supported for a long time) to handle symlinks separately from directories
>
>
> On 18 September 2013 17:24, Eli Collins <[EMAIL PROTECTED]> wrote:
>
> > On Wed, Sep 18, 2013 at 5:45 AM, Steve Loughran <[EMAIL PROTECTED]
> > >wrote:
> >
> > > On 18 September 2013 12:53, Alejandro Abdelnur <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > On Wed, Sep 18, 2013 at 11:29 AM, Steve Loughran <
> > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > I'm reluctant for this as while delaying the release, because we
> are
> > > > going
> > > > > to find problems all the way up the stack -which will require a
> > > > > choreographed set of changes. Given the grief of the protbuf
> update,
> > I
> > > > > don't want to go near that just before the final release.
> > > > >
> > > >
> > > > Well, I would use the exact same argument used for protobuf (which
> only
> > > > complication was getting protoc 2.5.0 in the jenkins boxes and
> > > communicate
> > > > developers to do the same, other than that we didn't hit any other
> > issue
> > > > AFAIK) ...
> > > >
> > >
> > > protobuf was traumatic at build time, as I recall because it was
> neither
> > > forwards or backwards compatible. Those of us trying to build different
> > > branches had to choose which version to have on the path, or set up
> > scripts
> > > to do the switching. HBase needed rebuilding, so did other things. And
> I
> > > still have the pain of downloading and installing protoc on all Linux
> > VMs I
> > > build up going forward, until apt-get and yum have protoc 2.5
> artifacts.
> > >
> > > This means it was very painful for developer, added a lot of late
> > breaking
> > > pain to the developers, but it had one key feature that gave it an
> edge:
> > it
> > > was immediately obvious where you had a problem as things didn't
> compile
> > or
> > > classload without linkage problems. No latent bugs, unless protobuf 2.5
> > has
> > > them internally -for which we have to rely on google's release testing
> to
> > > have found.
> > >
> > > That is a lot simpler to regression test than adding any new feature to
> > > HDFS and seeing what breaks -as that is something that only surfaces
> out
> > in
> > > the field. Which is why I think it's too late in the 2.1 release
> > timetable
> > > to add symlinks. We've had a 2.1-beta out there, we've got feedback.
> Fix
> > > those problems that are show stoppers, but don't add more stuff. Which
> is
> > > precisely why I have not been pushing in any of my recent changes. I
> may
> > > seem ruthless arguing against symlinks -but I'm not being inconsistent
> > with
> >
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB