Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # general >> where do side-projects go in trunk now that contrib/ is gone?


+
Steve Loughran 2013-02-11, 21:20
+
Eli Collins 2013-02-11, 21:36
+
Steve Loughran 2013-02-12, 08:55
+
Eli Collins 2013-02-12, 21:35
+
Steve Loughran 2013-02-12, 21:51
+
Eli Collins 2013-02-12, 22:09
+
Steve Loughran 2013-02-13, 09:44
+
Alejandro Abdelnur 2013-02-13, 20:07
+
Steve Loughran 2013-02-14, 14:05
+
Eric Baldeschwieler 2013-03-01, 05:02
+
Steve Loughran 2013-03-08, 14:43
+
Alejandro Abdelnur 2013-03-08, 16:15
+
Steve Loughran 2013-03-08, 16:57
+
Alejandro Abdelnur 2013-03-08, 17:07
Copy link to this message
-
Re: where do side-projects go in trunk now that contrib/ is gone?
I was chatting offline with Roman about this, his point is

1* segration of the FS impls into different modules makes sense
2* it should be OK if they have mock services for unittests
3* bigtop could do real integration testing
4* by doing this, the diff FileSystem impls would be there out of the box

If we go down this path, I'm OK with it.

Thoughts?
On Fri, Mar 8, 2013 at 9:07 AM, Alejandro Abdelnur <[EMAIL PROTECTED]>wrote:

>
> > We are already there with the S3 and Azure blobstores, as well as the FTP
> > filesystem
>
> I think this is not correct and we should plan moving them out.
>
> This is independent on the effort of straighten up the FS spec, which I
> think is great.
>
> Thx
>
> On Fri, Mar 8, 2013 at 8:57 AM, Steve Loughran <[EMAIL PROTECTED]>wrote:
>
>> On 8 March 2013 16:15, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote:
>>
>> > jumping a bit late into the discussion.
>> >
>> > yes. I started it in common-dev first, in the "where does contrib stuff
>> go
>> now", moved to general, where the conclusion was "except for special cases
>> like FS clients, it isn't".
>>
>> Now I'm trying to lay down the location for FS stuff, both for openstack,
>> and to handle so proposed dependency changes for s3n://
>>
>>
>> > I'd argue that unless those filesystems are part of hadoop, their
>> clients
>> > should not be distributed/build by hadoop.
>> >
>> > an analogy to this is not wanting Yarn to be the home for AM
>> > implementations.
>> >
>> > a key concern is testability and maintainability.
>> >
>>
>> We are already there with the S3 and Azure blobstores, as well as the FTP
>> filesystem
>>
>> The testability is straightforward for blobstores precisely because all
>> you
>> need is some credentials and cluster time; there's no requirement to have
>> some specific filesystem to hand. Any of those -very much in the vendors
>> hand to do their own testing, especially if the "it's a replacement for
>> HDFS" assertion is made.
>>
>> If you look at HADOOP-9361 you can see that I've been defining more
>> rigorously than before what our FS expectations are, with HADOOP-9371
>> spelling it out "what happens when you try to readFully() past the end of
>> a
>> file, or call getBlockLocations("/")? HDFS has actions here, and
>> downstream
>> code depends on some things (e.g. getBlockLocations() behaviour on
>> directories)
>>
>> https://issues.apache.org/jira/secure/attachment/12572328/HadoopFilesystemContract.pdf
>>
>> So far my initially blobstore-specific tests for the functional parts of
>> the specification (not the consistency, concurrency, atomicity parts) are
>> in github
>>
>> https://github.com/hortonworks/Hadoop-and-Swift-integration/tree/master/swift-file-system/src/test/java/org/apache/hadoop/fs/swift
>>
>>
>> I've also added more tests to the existing FS contract test, and in doing
>> so showed that s3 and s3n have some data-loss risks which need to be fixed
>> -that's an argument in having favour of the (testable, low-maintenance
>> cost) filesystems somewhere where any of us is free to fix.
>>
>> While we refine that spec better, I want to take those per-operation tests
>> from the SwiftFS support, make them retargetable at other filesystems, and
>> slowly apply them to all the distributed filesystems. Your colleague
>> Andrew
>> Wang is helping there by abstracting FileSystem and FileContext away, so
>> we
>> can test both.
>>
>> still, i see bigtop as the integration point and the mean of making those
>> > jars avail to a setup.
>> >
>> >
>> I plan to put integration -the tests that try to run Pig with arbitrary
>> source and dest filesystems, same for hive, plus some scale tests -can we
>> upload an 8GB file? What do you get back? can I create > 65536 entries in
>> a
>> single directory, and what happens to ls / performance?
>>
>> To summarise then
>>
>>    1. blobstores, ftpfilesystem & c could gradually move to a
>>    hadoop-common/hadoop-filesystem-clients
>>    2. A stricter specification of compliance, for the benefit of everyone

Alejandro
+
Steve Loughran 2013-03-09, 11:36
+
Alejandro Abdelnur 2013-03-11, 19:15
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB