Stephen Watt 2013-06-18, 18:15
-Re: Hadoop 2.0 - org.apache.hadoop.fs.Hdfs vs. DistributedFileSystem?
Eli Collins 2013-06-18, 18:38
That's correct, see HADOOP-6223 for the history. However, per Andrew
I don't think it's realistic to expect people to migrate off
FileSystem for a while (I filed HADOOP-6446 well over three years
The unfortunate consequence of the earlier decision to have parallel
interfaces rather than transition one over time means people
effectively need to end up implementing multiple backends - one that
gets used by clients of FileSystem, and one for clients of
FileContext. Implementing in only one place significantly limits
adoption of the feature or file system because they can't be
effectively adopted in practice unless they're available to old and
new clients (for example, this is why symlinks are getting backported
to FileSystem from FileContext).
On Tue, Jun 18, 2013 at 11:15 AM, Stephen Watt <[EMAIL PROTECTED]> wrote:
> Hi Folks
> My understanding is that from Hadoop 2.0 onwards the AbstractFileSystem is now the strategic class to extend for writing Hadoop FileSystem plugins. This is a departure from previous versions where one would extend the FileSystem class. This seems to be reinforced by the hadoop-default.xml for Hadoop 2.0 in the Apache Wiki (http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-project-dist/hadoop-common/core-default.xml) which shows fs.AbstractFileSystem.hdfs.impl being set to org.apache.hadoop.fs.Hdfs
> Is my assertion correct? Do we have community consensus around this? i.e. Beyond the apache distro, are the commercial distros (Intel, Hortonworks, Cloudera, WanDisco, EMC Pivotal, etc.) using org.apache.hadoop.fs.Hdfs as their filesystem plugin for HDFS? What does one lose by using the DistributedFileSystem class instead of the Hdfs class?
> Steve Watt
> ----- Original Message -----
> From: "Andrew Wang" <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Cc: "Milind Bhandarkar" <[EMAIL PROTECTED]>, "shv hadoop" <[EMAIL PROTECTED]>, "Steve Loughran" <[EMAIL PROTECTED]>, "Kun Ling" <[EMAIL PROTECTED]>, "Roman Shaposhnik" <[EMAIL PROTECTED]>, "Andrew Purtell" <[EMAIL PROTECTED]>, [EMAIL PROTECTED], [EMAIL PROTECTED], "Sanjay Radia" <[EMAIL PROTECTED]>
> Sent: Friday, June 14, 2013 1:32:38 PM
> Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
> Hey Steve,
> I agree that it's confusing. FileSystem and FileContext are essentially two
> parallel sets of interfaces for accessing filesystems in Hadoop.
> FileContext splits the interface and shared code with AbstractFileSystem,
> while FileSystem is all-in-one. If you're looking for the AFS equivalents
> to DistributedFileSystem and LocalFileSystem, see Hdfs and LocalFs.
> Realistically, FileSystem isn't going to be deprecated and removed any time
> soon. There are lots of 3rd-party FileSystem implementations, and most apps
> today use FileSystem (including many HDFS internals, like trash and the
> When I read the wiki page, I figured that the mention of AFS was
> essentially a typo, since everyone's been steaming ahead with FileSystem.
> Standardizing FileSystem makes total sense to me, I just wanted to confirm
> that plan.
> On Fri, Jun 14, 2013 at 9:38 AM, Stephen Watt <[EMAIL PROTECTED]> wrote:
>> This is a good point Andrew. The hangout was actually the first time I'd
>> heard about the AbstractFileSystem class. I've been doing some further
>> analysis on the source in Hadoop 2.0 and when I look at the Hadoop 2.0
>> implementation of DistributedFileSystem and LocalFileSystem class they
>> extend the FileSystem class and not AbstractFileSystem. I would imagine if
>> the plan for Hadoop 2.0 is to build FileSystem implementations using the
>> AbstractFileSystem, then those two would use it, so I'm a bit confused.
>> Perhaps I'm looking in the wrong place? Sanjay (or anyone else), could you
>> clarify this for us?
>> Steve Watt
>> ----- Original Message -----
>> From: "Andrew Wang" <[EMAIL PROTECTED]>