Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> where do side-projects go in trunk now that contrib/ is gone?

Copy link to this message
Re: where do side-projects go in trunk now that contrib/ is gone?
I was chatting offline with Roman about this, his point is

1* segration of the FS impls into different modules makes sense
2* it should be OK if they have mock services for unittests
3* bigtop could do real integration testing
4* by doing this, the diff FileSystem impls would be there out of the box

If we go down this path, I'm OK with it.

On Fri, Mar 8, 2013 at 9:07 AM, Alejandro Abdelnur <[EMAIL PROTECTED]>wrote:

> > We are already there with the S3 and Azure blobstores, as well as the FTP
> > filesystem
> I think this is not correct and we should plan moving them out.
> This is independent on the effort of straighten up the FS spec, which I
> think is great.
> Thx
> On Fri, Mar 8, 2013 at 8:57 AM, Steve Loughran <[EMAIL PROTECTED]>wrote:
>> On 8 March 2013 16:15, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote:
>> > jumping a bit late into the discussion.
>> >
>> > yes. I started it in common-dev first, in the "where does contrib stuff
>> go
>> now", moved to general, where the conclusion was "except for special cases
>> like FS clients, it isn't".
>> Now I'm trying to lay down the location for FS stuff, both for openstack,
>> and to handle so proposed dependency changes for s3n://
>> > I'd argue that unless those filesystems are part of hadoop, their
>> clients
>> > should not be distributed/build by hadoop.
>> >
>> > an analogy to this is not wanting Yarn to be the home for AM
>> > implementations.
>> >
>> > a key concern is testability and maintainability.
>> >
>> We are already there with the S3 and Azure blobstores, as well as the FTP
>> filesystem
>> The testability is straightforward for blobstores precisely because all
>> you
>> need is some credentials and cluster time; there's no requirement to have
>> some specific filesystem to hand. Any of those -very much in the vendors
>> hand to do their own testing, especially if the "it's a replacement for
>> HDFS" assertion is made.
>> If you look at HADOOP-9361 you can see that I've been defining more
>> rigorously than before what our FS expectations are, with HADOOP-9371
>> spelling it out "what happens when you try to readFully() past the end of
>> a
>> file, or call getBlockLocations("/")? HDFS has actions here, and
>> downstream
>> code depends on some things (e.g. getBlockLocations() behaviour on
>> directories)
>> https://issues.apache.org/jira/secure/attachment/12572328/HadoopFilesystemContract.pdf
>> So far my initially blobstore-specific tests for the functional parts of
>> the specification (not the consistency, concurrency, atomicity parts) are
>> in github
>> https://github.com/hortonworks/Hadoop-and-Swift-integration/tree/master/swift-file-system/src/test/java/org/apache/hadoop/fs/swift
>> I've also added more tests to the existing FS contract test, and in doing
>> so showed that s3 and s3n have some data-loss risks which need to be fixed
>> -that's an argument in having favour of the (testable, low-maintenance
>> cost) filesystems somewhere where any of us is free to fix.
>> While we refine that spec better, I want to take those per-operation tests
>> from the SwiftFS support, make them retargetable at other filesystems, and
>> slowly apply them to all the distributed filesystems. Your colleague
>> Andrew
>> Wang is helping there by abstracting FileSystem and FileContext away, so
>> we
>> can test both.
>> still, i see bigtop as the integration point and the mean of making those
>> > jars avail to a setup.
>> >
>> >
>> I plan to put integration -the tests that try to run Pig with arbitrary
>> source and dest filesystems, same for hive, plus some scale tests -can we
>> upload an 8GB file? What do you get back? can I create > 65536 entries in
>> a
>> single directory, and what happens to ls / performance?
>> To summarise then
>>    1. blobstores, ftpfilesystem & c could gradually move to a
>>    hadoop-common/hadoop-filesystem-clients
>>    2. A stricter specification of compliance, for the benefit of everyone