Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> where do side-projects go in trunk now that contrib/ is gone?

Copy link to this message
Re: where do side-projects go in trunk now that contrib/ is gone?

I like the idea of testing all FS for expected behavior, in HttpFS we are
already doing something along these lines testing HttpFS against HDFS and
LocalFS. Also testing 2 WebHDFS clients.

Regarding where these 'extensions' would go, well, we could have something
like share/hadoop/common/filesystem-ext/s3 and whoever wants to use s3
would have to symlink those JARs into common/lib. Or having a way to
activate via a HADOOP_COMMON_FS_EXT env which extension JARs to pick up. I
guess the BigTop guys could help defining this magic.
On Wed, Feb 13, 2013 at 1:44 AM, Steve Loughran <[EMAIL PROTECTED]>wrote:

> On 12 February 2013 22:09, Eli Collins <[EMAIL PROTECTED]> wrote:
> > I agree that the current place isn't a good one, for both the reasons
> > you mention on the jira (and because the people maintaining this code
> > don't primarily work on Hadoop). IMO the SwiftFS driver should live in
> > the swift source tree (as part of open stack).
> >
> If they could be persuaded to move beyond .py, it'd be tempting -because
> the FileSystem API is nominally stable.
> However, one thing I have noticed during this work is how the behaviour of
> FileSystem is underspecified -that's not an issue for HDFS, which gets
> stressed rigorously during the hdfs and mapred test runs, but it does
> matter for the rest.
> There's a lot of assumptions "files!=directories", mv / anything fails, and
> things that aren't tested (mv self self) returns true if self is file,
> false if a directory, what exception to raise if readFully goes past the
> end of a file (and the answer is?).
> We even make an implicit assumption that file operations are consistent:
> you get back what you wrote, which turns out to be an assumption not
> guaranteed by any of the blobstores in all circumstances.
> HADOOP-9258, HADOOP-9119 tighten the spec a bit, but if you look at what
> I've been doing for Swift testing, I've created a set of test suites, one
> per operation "ls", "read", "rename", with tests for scale, directory depth
> and width on my todo list:
> https://github.com/hortonworks/Hadoop-and-Swift-integration/tree/master/swift-file-system/src/test/java/org/apache/hadoop/fs/swift
> Then I want to extract those into tests that can be applied to all
> filesystems (say in o.a.g.fs.contract), with some per-FS metadata file
> providing details on what the FS supports (rename, append, case
> sensitivity, MAX_PATH, ...), so that we've got better test coverage (&
> being Junit4, you can skip tests in-code by throwing
> AssumptionViolatedExceptions; these get reported as skips), test coverage
> that can be applied to all the filesystems in the hadoop codebase.
> It's this expanded test coverage that will be the tightest coupling to
> hadoop.
> >
> > I'm not -1 on it living in-tree, it's just not my 1st choice. If you
> > want to create a top-level directory for 3rd party (read non-local,
> > non-hdfs file systems) file systems - go for it. It would be an
> > improvement on the current situation (o.a.h.fs.ftp also brings in
> > dependencies that most people don't need).  I don't think we need to
> > come up with a new top-level "kitchen sink" directory to handle all
> > Hadoop extensions, there are a few well-defined extension points that
> > can likely be handled independently so logically grouping them
> > separately makes sense to me (and perhaps we'll decide some extensions
> > are better in-tree and some not).
> >
> Makes sense. That I will do in a JIRA