-Re: [DISCUSS] - Committing client code to 3rd Party FileSystems within Hadoop Common
Stephen Watt 2013-05-23, 21:52
Thanks for responding Harsh.
I agree. Hadoop Common does do a good job of maintaining a stable and public FS and FS Context API. The pro for maintaining client libraries outside of Hadoop Common is that the release owner of the library has much more autonomy and agility in maintaining the library. From the glusterfs plugin perspective, I concur with this. In contrast, if my library was managed inside of Hadoop Common, I'd have to spend the time to earn committer status to have an equivalent amount of autonomy and agility, which is overkill for someone just wanting to maintain 400 lines of code.
I ruminated a bit about one con which might be that because it doesn't get shipped with Hadoop Common it might make it harder for the Hadoop User community to find out about it and obtain it. However, if you consider the LZO codec, the fact that its not bundled certainly doesn't hamper its adoption.
You mentioned testing. I don't think regression across Hadoop releases is as big of an issue as (based on my understanding) you really just have two FileSystem interfaces (abstract class) to worry about WRT to compliance, namely the FileSystem interface reflected for Hadoop 1.0 and the FileSystem interface reflected for Hadoop 2.0. However, this is a broader topic that I also want to discuss so I'll tee it up in a separate thread.
----- Original Message -----
From: "Harsh J" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Thursday, May 23, 2013 1:37:30 PM
Subject: Re: [DISCUSS] - Committing client code to 3rd Party FileSystems within Hadoop Common
I think we do a fairly good work maintaining a stable and public FileSystem
and FileContext API for third-party plugins to exist outside of Apache
Hadoop but still be able to work well across versions.
The question of test pops up though, specifically that of testing against
trunk to catch regressions across various implementations, but it'd be much
work for us to also maintain glusterfs dependencies and mechanisms as part
We do provide trunk build snapshot artifacts publicly for downstream
projects to test against, which I think may help cover the continuous
testing concerns, if there are those.
Right now, I don't think the S3 FS we maintain really works all that well.
I also recall, per recent conversations on the lists, that AMZN has started
shipping their own library for a better implementation rather than
perfecting the implementation we have here (correct me if am wrong but I
think the changes were not all contributed back). I see some work going on
for OpenStack's Swift, for which I think Steve also raised a similar
discussion here: http://search-hadoop.com/m/W1S5h2SrxlG, but I don't recall
if the conversation proceeded at the time.
What's your perspective as the releaser though? Would you not find
maintaining this outside easier, especially in terms of maintaining your
code for quicker releases, for both bug fixes and features - also given
that you can CI it against Apache Hadoop trunk at the same time?
On Thu, May 23, 2013 at 11:47 PM, Stephen Watt <[EMAIL PROTECTED]> wrote:
> (Resending - I think the first time I sent this out it got lost within all
> the ByLaws voting)
> Hi Folks
> My name is Steve Watt and I am presently working on enabling glusterfs to
> be used as a Hadoop FileSystem. Most of the work thus far has involved
> developing a Hadoop FileSystem plugin for glusterfs. I'm getting to the
> point where the plugin is becoming stable and I've been trying to
> understand where the right place is to host/manage/version it.
> Steve Loughran was kind enough to point out a few past threads in the
> community (such as
> that show a project disposition to move away from Hadoop Common containing
> client code (plugins) for 3rd party FileSystems. This makes sense and
> allows the filesystem plugin developer more autonomy as well as reduces
> Hadoop Common's dependence on 3rd Party libraries.