Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> symlink support in Hadoop 2 GA


Copy link to this message
-
Re: symlink support in Hadoop 2 GA
The issue is not modifying existing APIs.  The issue is that code has
been written that makes assumptions that are incompatible with the
existence of things that are not files or directories.  For example,
there is a lot of code out there that looks at FileStatus#isFile, and
if it returns false, assumes that what it is looking at is a
directory.  In the case of a symlink, this assumption is incorrect.

Faced with this, we have considered making the default behavior of
listStatus and globStatus to be fully resolving symlinks, and simply
not listing dangling symlinks. Code which is prepared to deal symlinks
can use newer versions of the listStatus and globStatus functions
which do return symlinks as symlinks.

We might consider defaulting FileSystem#listStatus and
FileSystem#globStatus to "fully resolving symlinks by default" and
defaulting FileContext#listStatus and FileContext#Util#globStatus to
the opposite.  This seems like the maximally compatible solution that
we're going to get.  I think this makes sense.

The alternative is kicking the can down the road to Hadoop 3, and
letting vendors of alternative (including some proprietary
alternative) systems continue to claim that "Hadoop doesn't support
symlinks yet" (with some justice).

P.S.  I would be fine with putting this in 2.2 or 2.3 if that seems
more appropriate.

sincerely,
Colin

On Tue, Sep 17, 2013 at 8:23 AM, Suresh Srinivas <[EMAIL PROTECTED]> wrote:
> I agree that this is an important change. However, 2.2.0 GA is getting
> ready to rollout in weeks. I am concerned that these changes will add not
> only incompatible changes late in the game, but also possibly instability.
> Java API incompatibility is some thing we have avoided for the most part
> and I am concerned that this is adding such incompatibility in FileSystem
> APIs. We should find work arounds by adding possibly newer APIs and leaving
> existing APIs as is. If this can be done, my vote is to enable this feature
> in 2.3. Even if it cannot be done, I am concerned that this is coming quite
> late and we should see if could allow some incompatible changes into 2.3
> for this feature.
>
>
> On Mon, Sep 16, 2013 at 6:49 PM, Andrew Wang <[EMAIL PROTECTED]>wrote:
>
>> Hi all,
>>
>> I wanted to broadcast plans for putting the FileSystem symlinks work
>> (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I think
>> it's pretty important we get it in since it's not a compatible change; if
>> it misses the GA train, we're not going to have symlinks until the next
>> major release.
>>
>> However, we're still dealing with ongoing issues revealed via testing.
>> There's user-code out there that only handles files and directories and
>> will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912
>> for a nice example where globStatus returning symlinks broke Pig; some of
>> us had a conference call to talk it through, and one definite conclusion
>> was that this wasn't solvable in a generally compatible manner.
>>
>> There are also still some gaps in symlink support right now. For example,
>> the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink
>> resolution, and tooling like the FsShell and Distcp still need to be
>> updated as well.
>>
>> So, there's definitely work to be done, but there are a lot of users
>> interested in the feature, and symlinks really should be in GA. Would
>> appreciate any thoughts/input on the matter.
>>
>> Thanks,
>> Andrew
>>
>
>
>
> --
> http://hortonworks.com/download/
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB