Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - symlink support in Hadoop 2 GA


Copy link to this message
-
Re: symlink support in Hadoop 2 GA
Colin McCabe 2013-09-19, 17:40
What we're trying to get to here is a consensus on whether
FileSystem#listStatus and FileSystem#globStatus should return symlinks
__as_symlinks__.  If 2.1-beta goes out with these semantics, I think
we are not going to be able to change them later.  That is what will
happen in the "do nothing" scenario.

Also see Jason Lowe's comment here:
https://issues.apache.org/jira/browse/HADOOP-9912?focusedCommentId=13772002&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13772002

Colin
On Wed, Sep 18, 2013 at 5:11 PM, J. Rottinghuis <[EMAIL PROTECTED]> wrote:
> However painful protobuf version changes are at build time for Hadoop
> developers, at runtime with multiple clusters and many Hadoop users this is
> a total nightmare.
> Even upgrading clusters from one protobuf version to the next is going to
> be very difficult. The same users will run jobs on, and/or read&write to
> multiple clusters. That means that they will have to fork their code, run
> multiple instances? Or in the very least they have to do an update to their
> applications. All in sync with Hadoop cluster changes. And these are not
> doable in a rolling fashion.
> All Hadoop and HBase clusters will all upgrade at the same time, or we'll
> have to have our users fork / roll multiple versions ?
> My point is that these things are much harder than just fix the (Jenkins)
> build and we're done. These changes are massively disruptive.
>
> There is a similar situation with symlinks. Having an API that lets users
> create symlinks is very problematic. Some users create symlinks and as Eli
> pointed out, somebody else (or automated process) tries to copy to / from
> another (Hadoop 1.x?) cluster over hftp. What will happen ?
> Having an API that people should not use is also a nightmare. We
> experienced this with append. For a while it was there, but users were "not
> allowed to use it" (or else there were large #'s of corrupt blocks). If
> there is an API to create a symlink, then some of our users are going to
> use it and others are going to trip over those symlinks. We already know
> that Pig does not work with symlinks yet, and as Steve pointed out, there
> is tons of other code out there that assumes that !isDir() means isFile().
>
> I like symlink functionality, but in our migration to Hadoop 2.x this is a
> total distraction. If the APIs stay in 2.2 GA we'll have to choose to:
> a) Not uprev until symlink support is figured out up and down the stack,
> and we've been able to migrate all our 1.x (equivalent) clusters to 2.x
> (equivalent). Or
> b) rip out the API altogether. Or
> c) change the implementation to throw an UnsupportedOperationException
> I'm not sure yet which of these I like least.
>
> Thanks,
>
> Joep
>
>
>
>
> On Wed, Sep 18, 2013 at 9:48 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
>
>>
>> On Sep 16, 2013, at 6:49 PM, Andrew Wang <[EMAIL PROTECTED]> wrote:
>>
>> > Hi all,
>> >
>> > I wanted to broadcast plans for putting the FileSystem symlinks work
>> > (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I
>> think
>> > it's pretty important we get it in since it's not a compatible change; if
>> > it misses the GA train, we're not going to have symlinks until the next
>> > major release.
>>
>> Just catching up, is this an incompatible change, or not? The above reads
>> 'not an incompatible change'.
>>
>> Arun
>>
>> >
>> > However, we're still dealing with ongoing issues revealed via testing.
>> > There's user-code out there that only handles files and directories and
>> > will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912
>> > for a nice example where globStatus returning symlinks broke Pig; some of
>> > us had a conference call to talk it through, and one definite conclusion
>> > was that this wasn't solvable in a generally compatible manner.
>> >
>> > There are also still some gaps in symlink support right now. For example,
>> > the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink