Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - Hadoop 2.0 - org.apache.hadoop.fs.Hdfs vs. DistributedFileSystem?

Copy link to this message
Hadoop 2.0 - org.apache.hadoop.fs.Hdfs vs. DistributedFileSystem?
Stephen Watt 2013-06-18, 18:15
Hi Folks

My understanding is that from Hadoop 2.0 onwards the AbstractFileSystem is now the strategic class to extend for writing Hadoop FileSystem plugins. This is a departure from previous versions where one would extend the FileSystem class. This seems to be reinforced by the hadoop-default.xml for Hadoop 2.0 in the Apache Wiki (http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-project-dist/hadoop-common/core-default.xml) which shows fs.AbstractFileSystem.hdfs.impl being set to org.apache.hadoop.fs.Hdfs

Is my assertion correct? Do we have community consensus around this? i.e. Beyond the apache distro, are the commercial distros (Intel, Hortonworks, Cloudera, WanDisco, EMC Pivotal, etc.) using org.apache.hadoop.fs.Hdfs as their filesystem plugin for HDFS? What does one lose by using the DistributedFileSystem class instead of the Hdfs class?

Steve Watt

----- Original Message -----
From: "Andrew Wang" <[EMAIL PROTECTED]>
Cc: "Milind Bhandarkar" <[EMAIL PROTECTED]>, "shv hadoop" <[EMAIL PROTECTED]>, "Steve Loughran" <[EMAIL PROTECTED]>, "Kun Ling" <[EMAIL PROTECTED]>, "Roman Shaposhnik" <[EMAIL PROTECTED]>, "Andrew Purtell" <[EMAIL PROTECTED]>, [EMAIL PROTECTED], [EMAIL PROTECTED], "Sanjay Radia" <[EMAIL PROTECTED]>
Sent: Friday, June 14, 2013 1:32:38 PM
Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Hey Steve,

I agree that it's confusing. FileSystem and FileContext are essentially two
parallel sets of interfaces for accessing filesystems in Hadoop.
FileContext splits the interface and shared code with AbstractFileSystem,
while FileSystem is all-in-one. If you're looking for the AFS equivalents
to DistributedFileSystem and LocalFileSystem, see Hdfs and LocalFs.

Realistically, FileSystem isn't going to be deprecated and removed any time
soon. There are lots of 3rd-party FileSystem implementations, and most apps
today use FileSystem (including many HDFS internals, like trash and the

When I read the wiki page, I figured that the mention of AFS was
essentially a typo, since everyone's been steaming ahead with FileSystem.
Standardizing FileSystem makes total sense to me, I just wanted to confirm
that plan.

On Fri, Jun 14, 2013 at 9:38 AM, Stephen Watt <[EMAIL PROTECTED]> wrote:

> This is a good point Andrew. The hangout was actually the first time I'd
> heard about the AbstractFileSystem class. I've been doing some further
> analysis on the source in Hadoop 2.0 and when I look at the Hadoop 2.0
> implementation of DistributedFileSystem and LocalFileSystem class they
> extend the FileSystem class and not AbstractFileSystem. I would imagine if
> the plan for Hadoop 2.0 is to build FileSystem implementations using the
> AbstractFileSystem, then those two would use it, so I'm a bit confused.
> Perhaps I'm looking in the wrong place? Sanjay (or anyone else), could you
> clarify this for us?
> Regards
> Steve Watt
> ----- Original Message -----
> From: "Andrew Wang" <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED], "shv hadoop" <[EMAIL PROTECTED]>,
> Sent: Monday, June 10, 2013 5:14:16 PM
> Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop
> FileSystems + Workshop
> Thanks for the summary Steve, very useful.
> I'm wondering a bit about the point on testing AbstractFileSystem rather
> than FileSystem. While these are both wrappers for DFSClient, they're
> pretty different in terms of the APIs they expose. Furthermore, AFS is not
> actually a client-facing API; clients interact with an AFS through
> FileContext.
> I ask because I did some work trying to unify the symlink tests for both
> FileContext and FileSystem (HADOOP-9370 and HADOOP-9355). Subtle things