Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # general >> where do side-projects go in trunk now that contrib/ is gone?


+
Steve Loughran 2013-02-11, 21:20
+
Eli Collins 2013-02-11, 21:36
+
Steve Loughran 2013-02-12, 08:55
+
Eli Collins 2013-02-12, 21:35
+
Steve Loughran 2013-02-12, 21:51
+
Eli Collins 2013-02-12, 22:09
+
Steve Loughran 2013-02-13, 09:44
Copy link to this message
-
Re: where do side-projects go in trunk now that contrib/ is gone?
Steve,

I like the idea of testing all FS for expected behavior, in HttpFS we are
already doing something along these lines testing HttpFS against HDFS and
LocalFS. Also testing 2 WebHDFS clients.

Regarding where these 'extensions' would go, well, we could have something
like share/hadoop/common/filesystem-ext/s3 and whoever wants to use s3
would have to symlink those JARs into common/lib. Or having a way to
activate via a HADOOP_COMMON_FS_EXT env which extension JARs to pick up. I
guess the BigTop guys could help defining this magic.
On Wed, Feb 13, 2013 at 1:44 AM, Steve Loughran <[EMAIL PROTECTED]>wrote:

> On 12 February 2013 22:09, Eli Collins <[EMAIL PROTECTED]> wrote:
>
> > I agree that the current place isn't a good one, for both the reasons
> > you mention on the jira (and because the people maintaining this code
> > don't primarily work on Hadoop). IMO the SwiftFS driver should live in
> > the swift source tree (as part of open stack).
> >
>
> If they could be persuaded to move beyond .py, it'd be tempting -because
> the FileSystem API is nominally stable.
>
> However, one thing I have noticed during this work is how the behaviour of
> FileSystem is underspecified -that's not an issue for HDFS, which gets
> stressed rigorously during the hdfs and mapred test runs, but it does
> matter for the rest.
>
> There's a lot of assumptions "files!=directories", mv / anything fails, and
> things that aren't tested (mv self self) returns true if self is file,
> false if a directory, what exception to raise if readFully goes past the
> end of a file (and the answer is?).
>
> We even make an implicit assumption that file operations are consistent:
> you get back what you wrote, which turns out to be an assumption not
> guaranteed by any of the blobstores in all circumstances.
>
> HADOOP-9258, HADOOP-9119 tighten the spec a bit, but if you look at what
> I've been doing for Swift testing, I've created a set of test suites, one
> per operation "ls", "read", "rename", with tests for scale, directory depth
> and width on my todo list:
>
>
> https://github.com/hortonworks/Hadoop-and-Swift-integration/tree/master/swift-file-system/src/test/java/org/apache/hadoop/fs/swift
>
>
> Then I want to extract those into tests that can be applied to all
> filesystems (say in o.a.g.fs.contract), with some per-FS metadata file
> providing details on what the FS supports (rename, append, case
> sensitivity, MAX_PATH, ...), so that we've got better test coverage (&
> being Junit4, you can skip tests in-code by throwing
> AssumptionViolatedExceptions; these get reported as skips), test coverage
> that can be applied to all the filesystems in the hadoop codebase.
>
> It's this expanded test coverage that will be the tightest coupling to
> hadoop.
>
> >
> > I'm not -1 on it living in-tree, it's just not my 1st choice. If you
> > want to create a top-level directory for 3rd party (read non-local,
> > non-hdfs file systems) file systems - go for it. It would be an
> > improvement on the current situation (o.a.h.fs.ftp also brings in
> > dependencies that most people don't need).  I don't think we need to
> > come up with a new top-level "kitchen sink" directory to handle all
> > Hadoop extensions, there are a few well-defined extension points that
> > can likely be handled independently so logically grouping them
> > separately makes sense to me (and perhaps we'll decide some extensions
> > are better in-tree and some not).
> >
>
> Makes sense. That I will do in a JIRA
>

--
Alejandro
+
Steve Loughran 2013-02-14, 14:05
+
Eric Baldeschwieler 2013-03-01, 05:02
+
Steve Loughran 2013-03-08, 14:43
+
Alejandro Abdelnur 2013-03-08, 16:15
+
Steve Loughran 2013-03-08, 16:57
+
Alejandro Abdelnur 2013-03-08, 17:07
+
Alejandro Abdelnur 2013-03-08, 18:47
+
Steve Loughran 2013-03-09, 11:36
+
Alejandro Abdelnur 2013-03-11, 19:15
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB