Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # dev >> relative symbolic links in HDFS


+
Charles Baker 2011-10-28, 16:46
+
Charles Baker 2011-10-28, 16:56
+
Eli Collins 2011-10-30, 02:02
+
Daryn Sharp 2011-10-31, 14:46
Copy link to this message
-
RE: relative symbolic links in HDFS
Hey guys. Thanks for the replies. Fully qualified symbolic links are
problematic in that when we wish to restore a directory structure containing
symlinks from HDFS to local filesystem, the relativity is lost. For instance:

/user/cbaker/foo/
                link1 -> ../../cbaker

The current behavior of getFileLinkStatus() results in a path for link 1
being:

/user/cbaker

Not:

../../cbaker
Also, some symlinks may point to non-existent locations within HDFS which
only have relevance to the local filesystem. This appears as though it could
(though I haven't tested yet) result in an exception when the attempt is made
to qualify it. If I get a chance, I'll try it out later today.

FileContext.getLinkTarget() doesn't work for this case since it returns only
the final component of the target, not the complete relative path. But even
if it did return the relative path, it seems counter-intuitive to me. I agree
with Daryn and expect the behavior of getFileLinkStatus() to return the
symlink as is and not presume that I wanted it qualified. If I wanted a
qualified path for a symlink, I would expect to call Path.makeQualified() to
do so.

Insofar as porting FsShell to FileContext, I've only modified it to support
our use-case. I haven't gone to the extent of fully porting it to
FileContext. Though I'd love to, unfortunately I'm too busy right now to
contribute :(

Thanks!

-Chuck

-----Original Message-----
From: Daryn Sharp [mailto:[EMAIL PROTECTED]]
Sent: Monday, October 31, 2011 7:46 AM
To: [EMAIL PROTECTED]
Subject: Re: relative symbolic links in HDFS

It's generally been a problem that filesystem operations mangle paths to be
something other than what the user provided.  FsShell has to go to some
(unnecessary, imho) lengths to independently track the user's given path so
the output paths will match what the user provided.  Not displaying the
user-given path makes it difficult/impossible for scripts to accurately parse
the output for the results of an operation on the given paths.

I like getLinkTarget returning the exact target, but I'd also like a
FileStatus to return the given path both in the case of a normal path and a
symlink.  If the user needs a fully qualified path for an operation, my
opinion is they should request it?

Daryn
On Oct 29, 2011, at 9:02 PM, Eli Collins wrote:

> Hey Chuck,
>
> Why is it problematic for your use that the symlink is stored in
> FileStatus fully qualified - you'd like FileContext#getSymlink to
> return the same Path that you used as the target in createSymlink?
>
> The current behavior is so getFileLinkStatus is consistent with
> getFileStatus(new Path("/some/file")) which returns a fully qualified
> path (eg hdfs://myhost:123/some/file).   Note that you can use
> FileContext#getLinkTarget to return the path used when creating the
> link. Some more background is in the design doc:
> https://issues.apache.org/jira/secure/attachment/12434745/design-doc-v4.txt
>
> There's a jira for porting FsShell to FileContext (HADOOP-6424), if
> you have a patch (even partial) feel free to post it to the jira.
> Note that since symlinks are not implemented in FileSystem, clients
> that use FileSystem to access paths with symlinks will fail.
>
> Btw when looking at the code you pointed out I noticed a bug in link
> resolution (HADOOP-7783), thanks!
>
> Thanks,
> Eli
>
>
> On Fri, Oct 28, 2011 at 9:46 AM, Charles Baker <[EMAIL PROTECTED]> wrote:
>> Hey guys. We are in the early stages of planning and evaluating a hadoop
>> 'cold-storage' cluster for medium to long term storage of mixed data
(small
>> to large files, zips, tar, etc...) and tons of symlinks. We do realize
that
>> small files aren't ideal in HDFS but it's for long-term storage and beats
the
>> cost of more NetApps by potentially several hundred thousand dollars by
>> leveraging existing equipment. We are already successfully using Hadoop
and
>> the MapReduce framework in a different project and have developed quite a
bit
>> of in-house expertise when it comes to Hadoop.
that
this
the
do
Or
ideas
and
bag
src="http://www.sdl.com/images/email_new_logo.png"
alt="www.sdl.com/sdl-vision" border="0"/></a>
style="color:005740; font-weight: bold">www.sdl.com/sdl-vision</a></font>
requires that you delete it without acting upon or copying any of its
contents, and we further request that you advise us.<BR>
Registered number: 02675207.<BR>
7DY, UK.
+
Daryn Sharp 2011-10-31, 16:46
+
Eli Collins 2011-10-31, 18:45
+
Daryn Sharp 2011-10-31, 20:54
+
Charles Baker 2011-10-31, 21:19
+
Eli Collins 2011-10-31, 22:41
+
Daryn Sharp 2011-11-01, 15:19
+
Eli Collins 2011-11-01, 16:30
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB