-Re: What is the difference between URI, Home Directory, Working Directory in FileSystem.java or HDFS
Ling Kun 2013-04-19, 09:38
Dear Daryn Sharp,
Your reply helps me a lot for code reading of the HDFS and FileSystem
On Thu, Apr 11, 2013 at 10:53 PM, Daryn Sharp <[EMAIL PROTECTED]> wrote:
> On Apr 11, 2013, at 5:33 AM, Ling Kun wrote:
> > Dear all,
> > I am a little confusing about the URI, Home Directory and Working
> Directory in the FileSystem.java or HDFS.
> > I have listed my understanding about these concept, can someone please
> figure out whether I am correct? Thanks.
> > The Home directory: This is usually a directory for a specific Hadoop
> users. And for the path, it is a user specific path. In HDFS, it is like
> > The URI: Is this the root of the distributed filesystem. for HDFS, it
> is just the HDFS://NameNode:port/ , each file/directory in the distributed
> filesystem is just a file or subdirectory in this path.
> Generally correct. However, I'd strongly suggest avoiding the use of URIs
> directly. It's better to obtain your filesystems via
> path.getFileSystem(conf) - it will extract the URI for the filesystem
> automatically. See below for the correct definition of a Path.
> > The working directory: I am a little confused about this variable. At
> a given time, there exists only one instance of the filesystem class, and
> the working dir is a private state of the FS. And during the job running,
> hadoop will switch among several dirs, and the working dir will be modified
> once it is switched. Like in the shared system dir, home dir, or
> input/output dir.
> > Although I have looked through the related document, I am still a
> little confused about the java.net.URI, java.io.File and
> org.apache.hadoop.fs.Path class. It seems URI could be
> hdfs://XXX/XXX/FILENAME, while Path only can be the path without the
> scheme, hostname and the port. For the File class, it is just an object
> for a specific file.
> Your understanding of Path is incorrect. Path is really just a veneer
> over a URI. A Path can be qualified with a scheme/authority, or just be
> absolute or relative. If a Path is not scheme qualified, it uses the
> defaultFS. If the Path is not absolute, it's qualified against the working
> directory. Path provides some niceties like not requiring percent encoding
> in the path portion of the URI, and allows use of glob chars and the
> quoting thereof.
> I hope this helps!