Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # dev - relative symbolic links in HDFS


+
Charles Baker 2011-10-28, 16:46
+
Charles Baker 2011-10-28, 16:56
+
Eli Collins 2011-10-30, 02:02
+
Daryn Sharp 2011-10-31, 14:46
+
Charles Baker 2011-10-31, 16:27
+
Daryn Sharp 2011-10-31, 16:46
+
Eli Collins 2011-10-31, 18:45
+
Daryn Sharp 2011-10-31, 20:54
Copy link to this message
-
RE: relative symbolic links in HDFS
Charles Baker 2011-10-31, 21:19
I did a hasty test initially of getLinkTarget() but forgot to also use the
same for the input path to FileContext#createSymlink() so yeah, turns out it
does indeed work. Sorry about that. Looks like I won't need to modify
FileContext after all which is good :)

The rationale of keeping things consistent so as not to break compatibility
makes sense, it just isn't that intuitive coming at it from a 'fresh'
perspective. Was the original idea to return the symlink information in
getFileStatus() instead of having to access it via  getFileLinkStatus()?
Maybe it's naive but it seems like you could just rename getFileLinkStatus()
to getFileStatus() and none would be the wiser...

Regardless, I do think it makes sense to have a convenience method to get the
raw path that was supplied at symlink creation. The first thing I tried was
Path#toString() so that I guess is pretty intuitive but I can't comment on
whether that would break compatibility.

Thanks!

-Chuck
-----Original Message-----
From: Eli Collins [mailto:[EMAIL PROTECTED]]
Sent: Monday, October 31, 2011 11:45 AM
To: [EMAIL PROTECTED]
Subject: Re: relative symbolic links in HDFS

On Mon, Oct 31, 2011 at 9:27 AM, Charles Baker <[EMAIL PROTECTED]> wrote:
> Hey guys. Thanks for the replies. Fully qualified symbolic links are
> problematic in that when we wish to restore a directory structure
containing
> symlinks from HDFS to local filesystem, the relativity is lost. For
instance:
>
> /user/cbaker/foo/
>                link1 -> ../../cbaker
>
> The current behavior of getFileLinkStatus() results in a path for link 1
> being:
>
> /user/cbaker
>
> Not:
>
> ../../cbaker
>
>
> Also, some symlinks may point to non-existent locations within HDFS which
> only have relevance to the local filesystem. This appears as though it
could
> (though I haven't tested yet) result in an exception when the attempt is
made
> to qualify it. If I get a chance, I'll try it out later today.
>
> FileContext.getLinkTarget() doesn't work for this case since it returns
only
> the final component of the target, not the complete relative path.

Really?  FC#getLinkTarget should return the target verbatim, as
specified by the user when creating the link:

Eg see test testCreateLinkToDotDotPrefix:
 fc.createSymlink(new Path("../file"), link, false);
 ...
 assertEquals(new Path("../file"), fc.getLinkTarget(link));
> But even
> if it did return the relative path, it seems counter-intuitive to me. I
agree
> with Daryn and expect the behavior of getFileLinkStatus() to return the
> symlink as is and not presume that I wanted it qualified. If I wanted a
> qualified path for a symlink, I would expect to call Path.makeQualified()
to
> do so.

It does this because getFileStatus always returns fully qualified
paths in HDFS, and we don't make to make callers check the type and
care about the method that was used to obtain the FileStatus, eg to
know whether it contains a fully qualified path or not.

I think the original rationale for while FileStatus objects always
have fully qualified paths is so they can be passed around w/o callers
having to do future work to access them ie didn't want to disassociate
the path from the file system it exists on. Note that in Hadoop
"Paths" are actually URIs, vs file system paths (a subset of URIs).

Regardless of the rationale, changing getFileStatus to return objects
w/o fully qualified paths would break compatibility with a lot of
existing programs. It would also hinder people porting to FileContext
which tries to be consistent with FileSystem.

Would a new method on FileStatus or Path that returns the unqualified
version of the path (ie w/o the scheme and authority, and w/o
resolving relative paths relative to the FileContext) work?  Ie the
FileStatus could return the contents of the HdfsFileStatus w/o making
it fully qualified.

Thanks,
Eli
SDL PLC confidential, all rights reserved.
If you are not the intended recipient of this mail SDL requests and requires that you delete it without acting upon or copying any of its contents, and we further request that you advise us.
SDL PLC is a public limited company registered in England and Wales.  Registered number: 02675207.
Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire SL6 7DY, UK.
+
Eli Collins 2011-10-31, 22:41
+
Daryn Sharp 2011-11-01, 15:19
+
Eli Collins 2011-11-01, 16:30