Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal


Copy link to this message
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal
Gary Helmling 2011-09-02, 18:40
Some comments on the proposal and differentiation vs HBase:

Access Labels:

The proposal claims that this is "unlikely to be adopted [in HBase]".  This
is completely untrue.  This has been discussed many times in the past in
relation to our security implementation.  It's just been deferred at the
moment due to a need to focus on the initial implementation.  But it's
certainly viewed as a potentially important feature for a future iteration.
Contributions always welcome!

see HBASE-3435: Provide per-column-qualifier and per-key-value security for
HBASE-3025
Iterators:

What do these provide that RegionObservers don't?  I'm speculating since the
proposal provides little in the way of details, but if these are "unlikely
to be adopted" it's only because coprocessors already offer more extensive
functionality.
"Flexibility" aka online schema changes and locality groups

Locality groups seem to be the only meaningful differentiation in this
entire comparison.
Testing

Performance under "some configurations and conditions" and unsubstantiated
"greater data integrity" is not meaningful differentiation.
Apache Brand

Claims a relationship with HBase.  Is there overlapping code or is this just
the duplication of functionality?  There's no community relationship that
I'm aware of.  I haven't seen any of the proposed committers on the HBase
user and dev lists to this point, so that doesn't set much of a precedent
for community interaction.
Overall I see no meaningful differentiation vs HBase as an existing project,
no past attempts to interact with the most relevant Apache community, and
only an, until now, private "community" of government users.  I think it's
great that they want to open source this.  I don't want to discourage that
-- go for it!  But I don't see what the benefit is of ASF incubating this.
I only see the potential for community fragmentation and market confusion
over such closely similar projects.
Gary
On Fri, Sep 2, 2011 at 11:06 AM, Stack <[EMAIL PROTECTED]> wrote:

> See here for the incubator proposal:
> http://wiki.apache.org/incubator/AccumuloProposal
>
> Reactions probably better belong over on the incubator mailing list
> but I thought a discussion here first might be useful developing a
> stance.
>
> Initial reaction, not having seen the code, is that it seems to be close to
> HBase; so close, they call HBase out explicitly in their proposal.
>
> The cell based 'access labels' seem like a matter of adding
> an extra field to KV and their Iterators seem like a specialization on
> Coprocessors.  The ability to add column families on the fly seems too
> minor a difference to call out especially if online schema edits are
> now (soon) supported.  They talk of locality group like functionality
> too -- that
> could be a significant difference.  We would have to see the code but at
> first blush, differences look small.
>
> Yet another BT implementation further divides this contended space.
> If there were to be an effort integrating HBase into Accumulo or vice
> versa, its likely to distract significantly from project forward motion (If
> the Accumulo fellows were interested in integrating the two projects,
> I'd have thought they'd have tried to talk to us before this so thats
> probably not their intent).
>
> On other hand, if their once-secret project is out in the open, we can
> steal the Apache-licensed good bits and....
>
> What do folks think?
>
> St.Ack
>