-Re: [DISCUSS] - Committing client code to 3rd Party FileSystems within Hadoop Common
Colin McCabe 2013-05-24, 00:28
You might try looking at what KosmoFS (KFS) did. They have some code in
org/apache/hadoop/fs which calls their own Java shim.
This way, the shim code in hadoop-common gets updated whenever FileSystem
changes, but there is no requirement to install KFS before building Hadoop.
You might also try asking Steve Loughran, since he did some great work
recently to try to nail down the exact semantics of FileSystem and
FileContext and improve the related unit tests (see HADOOP-9258 and related
On Thu, May 23, 2013 at 2:52 PM, Stephen Watt <[EMAIL PROTECTED]> wrote:
> Thanks for responding Harsh.
> I agree. Hadoop Common does do a good job of maintaining a stable and
> public FS and FS Context API. The pro for maintaining client libraries
> outside of Hadoop Common is that the release owner of the library has much
> more autonomy and agility in maintaining the library. From the glusterfs
> plugin perspective, I concur with this. In contrast, if my library was
> managed inside of Hadoop Common, I'd have to spend the time to earn
> committer status to have an equivalent amount of autonomy and agility,
> which is overkill for someone just wanting to maintain 400 lines of code.
> I ruminated a bit about one con which might be that because it doesn't get
> shipped with Hadoop Common it might make it harder for the Hadoop User
> community to find out about it and obtain it. However, if you consider the
> LZO codec, the fact that its not bundled certainly doesn't hamper its
> You mentioned testing. I don't think regression across Hadoop releases is
> as big of an issue as (based on my understanding) you really just have two
> FileSystem interfaces (abstract class) to worry about WRT to compliance,
> namely the FileSystem interface reflected for Hadoop 1.0 and the FileSystem
> interface reflected for Hadoop 2.0. However, this is a broader topic that I
> also want to discuss so I'll tee it up in a separate thread.
> Steve Watt
> ----- Original Message -----
> From: "Harsh J" <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Thursday, May 23, 2013 1:37:30 PM
> Subject: Re: [DISCUSS] - Committing client code to 3rd Party FileSystems
> within Hadoop Common
> I think we do a fairly good work maintaining a stable and public FileSystem
> and FileContext API for third-party plugins to exist outside of Apache
> Hadoop but still be able to work well across versions.
> The question of test pops up though, specifically that of testing against
> trunk to catch regressions across various implementations, but it'd be much
> work for us to also maintain glusterfs dependencies and mechanisms as part
> of trunk.
> We do provide trunk build snapshot artifacts publicly for downstream
> projects to test against, which I think may help cover the continuous
> testing concerns, if there are those.
> Right now, I don't think the S3 FS we maintain really works all that well.
> I also recall, per recent conversations on the lists, that AMZN has started
> shipping their own library for a better implementation rather than
> perfecting the implementation we have here (correct me if am wrong but I
> think the changes were not all contributed back). I see some work going on
> for OpenStack's Swift, for which I think Steve also raised a similar
> discussion here: http://search-hadoop.com/m/W1S5h2SrxlG, but I don't
> if the conversation proceeded at the time.
> What's your perspective as the releaser though? Would you not find
> maintaining this outside easier, especially in terms of maintaining your
> code for quicker releases, for both bug fixes and features - also given
> that you can CI it against Apache Hadoop trunk at the same time?
> On Thu, May 23, 2013 at 11:47 PM, Stephen Watt <[EMAIL PROTECTED]> wrote:
> > (Resending - I think the first time I sent this out it got lost within
> > the ByLaws voting)
> > Hi Folks
> > My name is Steve Watt and I am presently working on enabling glusterfs to