-Re: What is the difference between URI, Home Directory, Working Directory in FileSystem.java or HDFS
Daryn Sharp 2013-04-11, 14:53
On Apr 11, 2013, at 5:33 AM, Ling Kun wrote:
> Dear all,
> I am a little confusing about the URI, Home Directory and Working Directory in the FileSystem.java or HDFS.
> I have listed my understanding about these concept, can someone please figure out whether I am correct? Thanks.
> The Home directory: This is usually a directory for a specific Hadoop users. And for the path, it is a user specific path. In HDFS, it is like HDFS://NameNode:port/user/USERNAME.
> The URI: Is this the root of the distributed filesystem. for HDFS, it is just the HDFS://NameNode:port/ , each file/directory in the distributed filesystem is just a file or subdirectory in this path.
Generally correct. However, I'd strongly suggest avoiding the use of URIs directly. It's better to obtain your filesystems via path.getFileSystem(conf) - it will extract the URI for the filesystem automatically. See below for the correct definition of a Path.
> The working directory: I am a little confused about this variable. At a given time, there exists only one instance of the filesystem class, and the working dir is a private state of the FS. And during the job running, hadoop will switch among several dirs, and the working dir will be modified once it is switched. Like in the shared system dir, home dir, or input/output dir.
> Although I have looked through the related document, I am still a little confused about the java.net.URI, java.io.File and org.apache.hadoop.fs.Path class. It seems URI could be hdfs://XXX/XXX/FILENAME, while Path only can be the path without the scheme, hostname and the port. For the File class, it is just an object for a specific file.
Your understanding of Path is incorrect. Path is really just a veneer over a URI. A Path can be qualified with a scheme/authority, or just be absolute or relative. If a Path is not scheme qualified, it uses the defaultFS. If the Path is not absolute, it's qualified against the working directory. Path provides some niceties like not requiring percent encoding in the path portion of the URI, and allows use of glob chars and the quoting thereof.
I hope this helps!