Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> where do side-projects go in trunk now that contrib/ is gone?

Copy link to this message
Re: where do side-projects go in trunk now that contrib/ is gone?
> We are already there with the S3 and Azure blobstores, as well as the FTP
> filesystem

I think this is not correct and we should plan moving them out.

This is independent on the effort of straighten up the FS spec, which I
think is great.


On Fri, Mar 8, 2013 at 8:57 AM, Steve Loughran <[EMAIL PROTECTED]>wrote:

> On 8 March 2013 16:15, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote:
> > jumping a bit late into the discussion.
> >
> > yes. I started it in common-dev first, in the "where does contrib stuff
> go
> now", moved to general, where the conclusion was "except for special cases
> like FS clients, it isn't".
> Now I'm trying to lay down the location for FS stuff, both for openstack,
> and to handle so proposed dependency changes for s3n://
> > I'd argue that unless those filesystems are part of hadoop, their clients
> > should not be distributed/build by hadoop.
> >
> > an analogy to this is not wanting Yarn to be the home for AM
> > implementations.
> >
> > a key concern is testability and maintainability.
> >
> We are already there with the S3 and Azure blobstores, as well as the FTP
> filesystem
> The testability is straightforward for blobstores precisely because all you
> need is some credentials and cluster time; there's no requirement to have
> some specific filesystem to hand. Any of those -very much in the vendors
> hand to do their own testing, especially if the "it's a replacement for
> HDFS" assertion is made.
> If you look at HADOOP-9361 you can see that I've been defining more
> rigorously than before what our FS expectations are, with HADOOP-9371
> spelling it out "what happens when you try to readFully() past the end of a
> file, or call getBlockLocations("/")? HDFS has actions here, and downstream
> code depends on some things (e.g. getBlockLocations() behaviour on
> directories)
> https://issues.apache.org/jira/secure/attachment/12572328/HadoopFilesystemContract.pdf
> So far my initially blobstore-specific tests for the functional parts of
> the specification (not the consistency, concurrency, atomicity parts) are
> in github
> https://github.com/hortonworks/Hadoop-and-Swift-integration/tree/master/swift-file-system/src/test/java/org/apache/hadoop/fs/swift
> I've also added more tests to the existing FS contract test, and in doing
> so showed that s3 and s3n have some data-loss risks which need to be fixed
> -that's an argument in having favour of the (testable, low-maintenance
> cost) filesystems somewhere where any of us is free to fix.
> While we refine that spec better, I want to take those per-operation tests
> from the SwiftFS support, make them retargetable at other filesystems, and
> slowly apply them to all the distributed filesystems. Your colleague Andrew
> Wang is helping there by abstracting FileSystem and FileContext away, so we
> can test both.
> still, i see bigtop as the integration point and the mean of making those
> > jars avail to a setup.
> >
> >
> I plan to put integration -the tests that try to run Pig with arbitrary
> source and dest filesystems, same for hive, plus some scale tests -can we
> upload an 8GB file? What do you get back? can I create > 65536 entries in a
> single directory, and what happens to ls / performance?
> To summarise then
>    1. blobstores, ftpfilesystem & c could gradually move to a
>    hadoop-common/hadoop-filesystem-clients
>    2. A stricter specification of compliance, for the benefit of everyone
>    -us, other FS implementors and users of FS APIs
>    3. Lots of new functional tests for compliance -abstract in
>    hadoop-common; FS-specific in hadoop-filesystem-clients..
>    4. Integration & scale tests in bigtop
>    5. Anyone writing a "hadoop compatible FS" can grab the functional and
>    integration tests and see what breaks -fixing their code.
>    6. The combination of (Java API files, specification doc, functional
>    tests, HDFS implementation) define the expected behavior of a filesystem