Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # dev >> symlink support in Hadoop 2 GA

Andrew Wang 2013-09-17, 01:49
Suresh Srinivas 2013-09-17, 15:23
Copy link to this message
Re: symlink support in Hadoop 2 GA
The issue is not modifying existing APIs.  The issue is that code has
been written that makes assumptions that are incompatible with the
existence of things that are not files or directories.  For example,
there is a lot of code out there that looks at FileStatus#isFile, and
if it returns false, assumes that what it is looking at is a
directory.  In the case of a symlink, this assumption is incorrect.

Faced with this, we have considered making the default behavior of
listStatus and globStatus to be fully resolving symlinks, and simply
not listing dangling symlinks. Code which is prepared to deal symlinks
can use newer versions of the listStatus and globStatus functions
which do return symlinks as symlinks.

We might consider defaulting FileSystem#listStatus and
FileSystem#globStatus to "fully resolving symlinks by default" and
defaulting FileContext#listStatus and FileContext#Util#globStatus to
the opposite.  This seems like the maximally compatible solution that
we're going to get.  I think this makes sense.

The alternative is kicking the can down the road to Hadoop 3, and
letting vendors of alternative (including some proprietary
alternative) systems continue to claim that "Hadoop doesn't support
symlinks yet" (with some justice).

P.S.  I would be fine with putting this in 2.2 or 2.3 if that seems
more appropriate.


On Tue, Sep 17, 2013 at 8:23 AM, Suresh Srinivas <[EMAIL PROTECTED]> wrote:
> I agree that this is an important change. However, 2.2.0 GA is getting
> ready to rollout in weeks. I am concerned that these changes will add not
> only incompatible changes late in the game, but also possibly instability.
> Java API incompatibility is some thing we have avoided for the most part
> and I am concerned that this is adding such incompatibility in FileSystem
> APIs. We should find work arounds by adding possibly newer APIs and leaving
> existing APIs as is. If this can be done, my vote is to enable this feature
> in 2.3. Even if it cannot be done, I am concerned that this is coming quite
> late and we should see if could allow some incompatible changes into 2.3
> for this feature.
> On Mon, Sep 16, 2013 at 6:49 PM, Andrew Wang <[EMAIL PROTECTED]>wrote:
>> Hi all,
>> I wanted to broadcast plans for putting the FileSystem symlinks work
>> (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I think
>> it's pretty important we get it in since it's not a compatible change; if
>> it misses the GA train, we're not going to have symlinks until the next
>> major release.
>> However, we're still dealing with ongoing issues revealed via testing.
>> There's user-code out there that only handles files and directories and
>> will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912
>> for a nice example where globStatus returning symlinks broke Pig; some of
>> us had a conference call to talk it through, and one definite conclusion
>> was that this wasn't solvable in a generally compatible manner.
>> There are also still some gaps in symlink support right now. For example,
>> the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink
>> resolution, and tooling like the FsShell and Distcp still need to be
>> updated as well.
>> So, there's definitely work to be done, but there are a lot of users
>> interested in the feature, and symlinks really should be in GA. Would
>> appreciate any thoughts/input on the matter.
>> Thanks,
>> Andrew
> --
> http://hortonworks.com/download/
> --
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
Andrew Wang 2013-09-17, 20:27
Colin McCabe 2013-09-17, 15:14
Eli Collins 2013-09-17, 22:05
Steve Loughran 2013-09-18, 09:29
Alejandro Abdelnur 2013-09-18, 11:53
Steve Loughran 2013-09-18, 12:45
Eli Collins 2013-09-18, 16:24
Arun C Murthy 2013-09-18, 16:48
J. Rottinghuis 2013-09-19, 00:11
Colin McCabe 2013-09-19, 17:40
Alejandro Abdelnur 2013-09-19, 11:12
sanjay Radia 2013-10-03, 18:39
Daryn Sharp 2013-10-03, 21:08
Andrew Wang 2013-10-04, 17:58
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB