Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal

Copy link to this message
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal
To add to what Todd said, I actually worked with those guys for the
last 3 years and have used Accumulo in production. It's true that it
would have been better if they had been able to contribute to HBase
rather than go on their own, but it's not easy to contribute to open
source, either officially or unofficially when you work at NSA. I
think there is precedence for competing and/or "duplicate" Apache
projects, Avro/Thrift and HBase/Cassandra come to mind. I'm mostly
interested in this project setting a precedent for other work at NSA
to be developed as open source.


On Fri, Sep 2, 2011 at 3:09 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> Hey folks,
> <wearing my Todd hat and not my Cloudera hat!>
> I've been in touch with this team for the last 18 months or so.
> They're good people, smart, and have a healthy respect for HBase and
> our team. Though they haven't contributed code or participated on the
> lists, I can vouch that they do follow our development and generally
> do understand HBase as well as what makes their system different. In
> the context of the incubator proposal, they're trying to explain why
> their system is different than HBase, and not trying to knock our
> project. They do borrow our ideas, and in the future we'll be able to
> borrow some of theirs. Iterator trees, for example, are distinct from
> coprocessors and have some really nice capabilities which I'm looking
> forward to adapting into HBase.
> There are a couple things to keep in mind about the story here:
> - they first evaluated HBase 3 years ago. HBase at that point was not
> usable for their application - I think several of us here remember the
> state of HBase at the time and might have made the same decision. So,
> they started their own project with an internal team of 5-6 people.
> - contributing to open source from within the NSA is not easy, for
> obvious reasons. They've jumped through many hoops to open source
> this, and we should be thankful for that. Now that they're out in open
> source land, I think we'll see them collaborating with us much more
> openly.
> I for one look forward to working with these folks, and maybe merging
> the projects some time down the road as the feature lists converge.
> -Todd
> On Fri, Sep 2, 2011 at 11:40 AM, Gary Helmling <[EMAIL PROTECTED]> wrote:
>> Some comments on the proposal and differentiation vs HBase:
>> Access Labels:
>> The proposal claims that this is "unlikely to be adopted [in HBase]".  This
>> is completely untrue.  This has been discussed many times in the past in
>> relation to our security implementation.  It's just been deferred at the
>> moment due to a need to focus on the initial implementation.  But it's
>> certainly viewed as a potentially important feature for a future iteration.
>> Contributions always welcome!
>> see HBASE-3435: Provide per-column-qualifier and per-key-value security for
>> HBASE-3025
>> Iterators:
>> What do these provide that RegionObservers don't?  I'm speculating since the
>> proposal provides little in the way of details, but if these are "unlikely
>> to be adopted" it's only because coprocessors already offer more extensive
>> functionality.
>> "Flexibility" aka online schema changes and locality groups
>> Locality groups seem to be the only meaningful differentiation in this
>> entire comparison.
>> Testing
>> Performance under "some configurations and conditions" and unsubstantiated
>> "greater data integrity" is not meaningful differentiation.
>> Apache Brand
>> Claims a relationship with HBase.  Is there overlapping code or is this just
>> the duplication of functionality?  There's no community relationship that
>> I'm aware of.  I haven't seen any of the proposed committers on the HBase
>> user and dev lists to this point, so that doesn't set much of a precedent
>> for community interaction.
>> Overall I see no meaningful differentiation vs HBase as an existing project,
>> no past attempts to interact with the most relevant Apache community, and

Joseph Echeverria
Cloudera, Inc.