|
Stack
2011-09-02, 18:06
Doug Meil
2011-09-02, 18:33
Gary Helmling
2011-09-02, 18:40
Todd Lipcon
2011-09-02, 19:09
Joey Echeverria
2011-09-02, 19:30
Ted Yu
2011-09-02, 19:37
Joey Echeverria
2011-09-02, 19:40
Andrew Purtell
2011-09-03, 02:06
Gary Helmling
2011-09-02, 19:55
Andrew Purtell
2011-09-03, 02:13
Stack
2011-09-03, 05:17
Bernd Fondermann
2011-09-03, 07:00
Ryan Rawson
2011-09-03, 07:17
Bernd Fondermann
2011-09-03, 07:55
Ryan Rawson
2011-09-03, 09:19
Doug Meil
2011-09-03, 13:40
Stack
2011-09-03, 20:21
Andrew Purtell
2011-09-04, 10:22
Stack
2011-09-04, 22:34
Gary Helmling
2011-09-06, 19:02
Duane Moore
2011-09-06, 16:21
Ted Dunning
2011-09-06, 16:31
Stack
2011-09-06, 16:58
Andrew Purtell
2011-09-09, 18:50
Bradford Stephens
2011-09-09, 19:24
Amandeep Khurana
2011-09-09, 20:28
Andrew Purtell
2011-09-03, 10:11
Bernd Fondermann
2011-09-02, 20:01
Mathias Herberts
2011-09-02, 20:24
Doug Meil
2011-09-02, 20:29
Bernd Fondermann
2011-09-03, 06:49
Eric Charles
2011-09-03, 19:46
Stack
2011-09-03, 20:23
Bill de hÓra
2011-09-03, 23:16
Stack
2011-09-04, 02:54
Mathias Herberts
2011-09-04, 06:43
Ryan Rawson
2011-09-04, 06:49
Bill
2011-09-05, 21:06
Joey Echeverria
2011-09-05, 21:35
Doug Meil
2011-09-06, 21:51
Bill de hÓra
2011-09-05, 20:54
Steven Noels
2011-09-05, 08:30
Bernd Fondermann
2011-09-05, 10:50
Andrew Purtell
2011-09-05, 14:10
Andrew Purtell
2011-09-05, 14:31
Bernd Fondermann
2011-09-05, 19:32
Andrew Purtell
2011-09-07, 01:44
|
-
[DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalStack 2011-09-02, 18:06
See here for the incubator proposal:
http://wiki.apache.org/incubator/AccumuloProposal Reactions probably better belong over on the incubator mailing list but I thought a discussion here first might be useful developing a stance. Initial reaction, not having seen the code, is that it seems to be close to HBase; so close, they call HBase out explicitly in their proposal. The cell based 'access labels' seem like a matter of adding an extra field to KV and their Iterators seem like a specialization on Coprocessors. The ability to add column families on the fly seems too minor a difference to call out especially if online schema edits are now (soon) supported. They talk of locality group like functionality too -- that could be a significant difference. We would have to see the code but at first blush, differences look small. Yet another BT implementation further divides this contended space. If there were to be an effort integrating HBase into Accumulo or vice versa, its likely to distract significantly from project forward motion (If the Accumulo fellows were interested in integrating the two projects, I'd have thought they'd have tried to talk to us before this so thats probably not their intent). On other hand, if their once-secret project is out in the open, we can steal the Apache-licensed good bits and.... What do folks think? St.Ack +
Stack 2011-09-02, 18:06
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalDoug Meil 2011-09-02, 18:33
They apparently have a very strong relationship with HBase... ;-) "Apache Brand Our interest in releasing this code as an Apache incubator project is due to its strong relationship with other Apache projects, i.e. Hadoop, Zookeeper, and HBase." It's disappointing that these guys watched HBase develop in public over the last few years and they still went ahead with their own thing in private despite the "strong relationship. Not the first time this has happened in software, and won't be the last. (sigh). On 9/2/11 2:06 PM, "Stack" <[EMAIL PROTECTED]> wrote: >See here for the incubator proposal: >http://wiki.apache.org/incubator/AccumuloProposal > >Reactions probably better belong over on the incubator mailing list >but I thought a discussion here first might be useful developing a >stance. > >Initial reaction, not having seen the code, is that it seems to be close >to >HBase; so close, they call HBase out explicitly in their proposal. > >The cell based 'access labels' seem like a matter of adding >an extra field to KV and their Iterators seem like a specialization on >Coprocessors. The ability to add column families on the fly seems too >minor a difference to call out especially if online schema edits are >now (soon) supported. They talk of locality group like functionality >too -- that >could be a significant difference. We would have to see the code but at >first blush, differences look small. > >Yet another BT implementation further divides this contended space. >If there were to be an effort integrating HBase into Accumulo or vice >versa, its likely to distract significantly from project forward motion >(If >the Accumulo fellows were interested in integrating the two projects, >I'd have thought they'd have tried to talk to us before this so thats >probably not their intent). > >On other hand, if their once-secret project is out in the open, we can >steal the Apache-licensed good bits and.... > >What do folks think? > >St.Ack +
Doug Meil 2011-09-02, 18:33
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalGary Helmling 2011-09-02, 18:40
Some comments on the proposal and differentiation vs HBase:
Access Labels: The proposal claims that this is "unlikely to be adopted [in HBase]". This is completely untrue. This has been discussed many times in the past in relation to our security implementation. It's just been deferred at the moment due to a need to focus on the initial implementation. But it's certainly viewed as a potentially important feature for a future iteration. Contributions always welcome! see HBASE-3435: Provide per-column-qualifier and per-key-value security for HBASE-3025 Iterators: What do these provide that RegionObservers don't? I'm speculating since the proposal provides little in the way of details, but if these are "unlikely to be adopted" it's only because coprocessors already offer more extensive functionality. "Flexibility" aka online schema changes and locality groups Locality groups seem to be the only meaningful differentiation in this entire comparison. Testing Performance under "some configurations and conditions" and unsubstantiated "greater data integrity" is not meaningful differentiation. Apache Brand Claims a relationship with HBase. Is there overlapping code or is this just the duplication of functionality? There's no community relationship that I'm aware of. I haven't seen any of the proposed committers on the HBase user and dev lists to this point, so that doesn't set much of a precedent for community interaction. Overall I see no meaningful differentiation vs HBase as an existing project, no past attempts to interact with the most relevant Apache community, and only an, until now, private "community" of government users. I think it's great that they want to open source this. I don't want to discourage that -- go for it! But I don't see what the benefit is of ASF incubating this. I only see the potential for community fragmentation and market confusion over such closely similar projects. Gary On Fri, Sep 2, 2011 at 11:06 AM, Stack <[EMAIL PROTECTED]> wrote: > See here for the incubator proposal: > http://wiki.apache.org/incubator/AccumuloProposal > > Reactions probably better belong over on the incubator mailing list > but I thought a discussion here first might be useful developing a > stance. > > Initial reaction, not having seen the code, is that it seems to be close to > HBase; so close, they call HBase out explicitly in their proposal. > > The cell based 'access labels' seem like a matter of adding > an extra field to KV and their Iterators seem like a specialization on > Coprocessors. The ability to add column families on the fly seems too > minor a difference to call out especially if online schema edits are > now (soon) supported. They talk of locality group like functionality > too -- that > could be a significant difference. We would have to see the code but at > first blush, differences look small. > > Yet another BT implementation further divides this contended space. > If there were to be an effort integrating HBase into Accumulo or vice > versa, its likely to distract significantly from project forward motion (If > the Accumulo fellows were interested in integrating the two projects, > I'd have thought they'd have tried to talk to us before this so thats > probably not their intent). > > On other hand, if their once-secret project is out in the open, we can > steal the Apache-licensed good bits and.... > > What do folks think? > > St.Ack > +
Gary Helmling 2011-09-02, 18:40
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalTodd Lipcon 2011-09-02, 19:09
Hey folks,
<wearing my Todd hat and not my Cloudera hat!> I've been in touch with this team for the last 18 months or so. They're good people, smart, and have a healthy respect for HBase and our team. Though they haven't contributed code or participated on the lists, I can vouch that they do follow our development and generally do understand HBase as well as what makes their system different. In the context of the incubator proposal, they're trying to explain why their system is different than HBase, and not trying to knock our project. They do borrow our ideas, and in the future we'll be able to borrow some of theirs. Iterator trees, for example, are distinct from coprocessors and have some really nice capabilities which I'm looking forward to adapting into HBase. There are a couple things to keep in mind about the story here: - they first evaluated HBase 3 years ago. HBase at that point was not usable for their application - I think several of us here remember the state of HBase at the time and might have made the same decision. So, they started their own project with an internal team of 5-6 people. - contributing to open source from within the NSA is not easy, for obvious reasons. They've jumped through many hoops to open source this, and we should be thankful for that. Now that they're out in open source land, I think we'll see them collaborating with us much more openly. I for one look forward to working with these folks, and maybe merging the projects some time down the road as the feature lists converge. -Todd On Fri, Sep 2, 2011 at 11:40 AM, Gary Helmling <[EMAIL PROTECTED]> wrote: > Some comments on the proposal and differentiation vs HBase: > > Access Labels: > > The proposal claims that this is "unlikely to be adopted [in HBase]". This > is completely untrue. This has been discussed many times in the past in > relation to our security implementation. It's just been deferred at the > moment due to a need to focus on the initial implementation. But it's > certainly viewed as a potentially important feature for a future iteration. > Contributions always welcome! > > see HBASE-3435: Provide per-column-qualifier and per-key-value security for > HBASE-3025 > > > Iterators: > > What do these provide that RegionObservers don't? I'm speculating since the > proposal provides little in the way of details, but if these are "unlikely > to be adopted" it's only because coprocessors already offer more extensive > functionality. > > > "Flexibility" aka online schema changes and locality groups > > Locality groups seem to be the only meaningful differentiation in this > entire comparison. > > > Testing > > Performance under "some configurations and conditions" and unsubstantiated > "greater data integrity" is not meaningful differentiation. > > > Apache Brand > > Claims a relationship with HBase. Is there overlapping code or is this just > the duplication of functionality? There's no community relationship that > I'm aware of. I haven't seen any of the proposed committers on the HBase > user and dev lists to this point, so that doesn't set much of a precedent > for community interaction. > > > Overall I see no meaningful differentiation vs HBase as an existing project, > no past attempts to interact with the most relevant Apache community, and > only an, until now, private "community" of government users. I think it's > great that they want to open source this. I don't want to discourage that > -- go for it! But I don't see what the benefit is of ASF incubating this. > I only see the potential for community fragmentation and market confusion > over such closely similar projects. > > > Gary > > > On Fri, Sep 2, 2011 at 11:06 AM, Stack <[EMAIL PROTECTED]> wrote: > >> See here for the incubator proposal: >> http://wiki.apache.org/incubator/AccumuloProposal >> >> Reactions probably better belong over on the incubator mailing list >> but I thought a discussion here first might be useful developing a >> stance. >> >> Initial reaction, not having seen the code, is that it seems to be close to Todd Lipcon Software Engineer, Cloudera +
Todd Lipcon 2011-09-02, 19:09
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalJoey Echeverria 2011-09-02, 19:30
To add to what Todd said, I actually worked with those guys for the
last 3 years and have used Accumulo in production. It's true that it would have been better if they had been able to contribute to HBase rather than go on their own, but it's not easy to contribute to open source, either officially or unofficially when you work at NSA. I think there is precedence for competing and/or "duplicate" Apache projects, Avro/Thrift and HBase/Cassandra come to mind. I'm mostly interested in this project setting a precedent for other work at NSA to be developed as open source. -Joey On Fri, Sep 2, 2011 at 3:09 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > Hey folks, > > <wearing my Todd hat and not my Cloudera hat!> > > I've been in touch with this team for the last 18 months or so. > They're good people, smart, and have a healthy respect for HBase and > our team. Though they haven't contributed code or participated on the > lists, I can vouch that they do follow our development and generally > do understand HBase as well as what makes their system different. In > the context of the incubator proposal, they're trying to explain why > their system is different than HBase, and not trying to knock our > project. They do borrow our ideas, and in the future we'll be able to > borrow some of theirs. Iterator trees, for example, are distinct from > coprocessors and have some really nice capabilities which I'm looking > forward to adapting into HBase. > > There are a couple things to keep in mind about the story here: > - they first evaluated HBase 3 years ago. HBase at that point was not > usable for their application - I think several of us here remember the > state of HBase at the time and might have made the same decision. So, > they started their own project with an internal team of 5-6 people. > - contributing to open source from within the NSA is not easy, for > obvious reasons. They've jumped through many hoops to open source > this, and we should be thankful for that. Now that they're out in open > source land, I think we'll see them collaborating with us much more > openly. > > I for one look forward to working with these folks, and maybe merging > the projects some time down the road as the feature lists converge. > > -Todd > > On Fri, Sep 2, 2011 at 11:40 AM, Gary Helmling <[EMAIL PROTECTED]> wrote: >> Some comments on the proposal and differentiation vs HBase: >> >> Access Labels: >> >> The proposal claims that this is "unlikely to be adopted [in HBase]". This >> is completely untrue. This has been discussed many times in the past in >> relation to our security implementation. It's just been deferred at the >> moment due to a need to focus on the initial implementation. But it's >> certainly viewed as a potentially important feature for a future iteration. >> Contributions always welcome! >> >> see HBASE-3435: Provide per-column-qualifier and per-key-value security for >> HBASE-3025 >> >> >> Iterators: >> >> What do these provide that RegionObservers don't? I'm speculating since the >> proposal provides little in the way of details, but if these are "unlikely >> to be adopted" it's only because coprocessors already offer more extensive >> functionality. >> >> >> "Flexibility" aka online schema changes and locality groups >> >> Locality groups seem to be the only meaningful differentiation in this >> entire comparison. >> >> >> Testing >> >> Performance under "some configurations and conditions" and unsubstantiated >> "greater data integrity" is not meaningful differentiation. >> >> >> Apache Brand >> >> Claims a relationship with HBase. Is there overlapping code or is this just >> the duplication of functionality? There's no community relationship that >> I'm aware of. I haven't seen any of the proposed committers on the HBase >> user and dev lists to this point, so that doesn't set much of a precedent >> for community interaction. >> >> >> Overall I see no meaningful differentiation vs HBase as an existing project, >> no past attempts to interact with the most relevant Apache community, and Joseph Echeverria Cloudera, Inc. 443.305.9434 +
Joey Echeverria 2011-09-02, 19:30
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalTed Yu 2011-09-02, 19:37
Thanks for the update Joey.
May someone close to NSA disclose what may have changed recently that allows contributing to Open Source eaiser ? On Fri, Sep 2, 2011 at 12:30 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > To add to what Todd said, I actually worked with those guys for the > last 3 years and have used Accumulo in production. It's true that it > would have been better if they had been able to contribute to HBase > rather than go on their own, but it's not easy to contribute to open > source, either officially or unofficially when you work at NSA. I > think there is precedence for competing and/or "duplicate" Apache > projects, Avro/Thrift and HBase/Cassandra come to mind. I'm mostly > interested in this project setting a precedent for other work at NSA > to be developed as open source. > > -Joey > > On Fri, Sep 2, 2011 at 3:09 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > > Hey folks, > > > > <wearing my Todd hat and not my Cloudera hat!> > > > > I've been in touch with this team for the last 18 months or so. > > They're good people, smart, and have a healthy respect for HBase and > > our team. Though they haven't contributed code or participated on the > > lists, I can vouch that they do follow our development and generally > > do understand HBase as well as what makes their system different. In > > the context of the incubator proposal, they're trying to explain why > > their system is different than HBase, and not trying to knock our > > project. They do borrow our ideas, and in the future we'll be able to > > borrow some of theirs. Iterator trees, for example, are distinct from > > coprocessors and have some really nice capabilities which I'm looking > > forward to adapting into HBase. > > > > There are a couple things to keep in mind about the story here: > > - they first evaluated HBase 3 years ago. HBase at that point was not > > usable for their application - I think several of us here remember the > > state of HBase at the time and might have made the same decision. So, > > they started their own project with an internal team of 5-6 people. > > - contributing to open source from within the NSA is not easy, for > > obvious reasons. They've jumped through many hoops to open source > > this, and we should be thankful for that. Now that they're out in open > > source land, I think we'll see them collaborating with us much more > > openly. > > > > I for one look forward to working with these folks, and maybe merging > > the projects some time down the road as the feature lists converge. > > > > -Todd > > > > On Fri, Sep 2, 2011 at 11:40 AM, Gary Helmling <[EMAIL PROTECTED]> > wrote: > >> Some comments on the proposal and differentiation vs HBase: > >> > >> Access Labels: > >> > >> The proposal claims that this is "unlikely to be adopted [in HBase]". > This > >> is completely untrue. This has been discussed many times in the past in > >> relation to our security implementation. It's just been deferred at the > >> moment due to a need to focus on the initial implementation. But it's > >> certainly viewed as a potentially important feature for a future > iteration. > >> Contributions always welcome! > >> > >> see HBASE-3435: Provide per-column-qualifier and per-key-value security > for > >> HBASE-3025 > >> > >> > >> Iterators: > >> > >> What do these provide that RegionObservers don't? I'm speculating since > the > >> proposal provides little in the way of details, but if these are > "unlikely > >> to be adopted" it's only because coprocessors already offer more > extensive > >> functionality. > >> > >> > >> "Flexibility" aka online schema changes and locality groups > >> > >> Locality groups seem to be the only meaningful differentiation in this > >> entire comparison. > >> > >> > >> Testing > >> > >> Performance under "some configurations and conditions" and > unsubstantiated > >> "greater data integrity" is not meaningful differentiation. > >> > >> > >> Apache Brand > >> > >> Claims a relationship with HBase. Is there overlapping code or is this +
Ted Yu 2011-09-02, 19:37
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalJoey Echeverria 2011-09-02, 19:40
It's not really easier, they've been working on getting this released
for 2.5 years or more. What I think will make it easier is having more of a precedence. In the government, it's always easier to say no than yes. Showing that it can be done and done successfully will push them to develop a consistent process. Hopefully in the future it will take less than 2.5 years to go public :) -Joey On Fri, Sep 2, 2011 at 3:37 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Thanks for the update Joey. > May someone close to NSA disclose what may have changed recently that allows > contributing to Open Source eaiser ? > > On Fri, Sep 2, 2011 at 12:30 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > >> To add to what Todd said, I actually worked with those guys for the >> last 3 years and have used Accumulo in production. It's true that it >> would have been better if they had been able to contribute to HBase >> rather than go on their own, but it's not easy to contribute to open >> source, either officially or unofficially when you work at NSA. I >> think there is precedence for competing and/or "duplicate" Apache >> projects, Avro/Thrift and HBase/Cassandra come to mind. I'm mostly >> interested in this project setting a precedent for other work at NSA >> to be developed as open source. >> >> -Joey >> >> On Fri, Sep 2, 2011 at 3:09 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: >> > Hey folks, >> > >> > <wearing my Todd hat and not my Cloudera hat!> >> > >> > I've been in touch with this team for the last 18 months or so. >> > They're good people, smart, and have a healthy respect for HBase and >> > our team. Though they haven't contributed code or participated on the >> > lists, I can vouch that they do follow our development and generally >> > do understand HBase as well as what makes their system different. In >> > the context of the incubator proposal, they're trying to explain why >> > their system is different than HBase, and not trying to knock our >> > project. They do borrow our ideas, and in the future we'll be able to >> > borrow some of theirs. Iterator trees, for example, are distinct from >> > coprocessors and have some really nice capabilities which I'm looking >> > forward to adapting into HBase. >> > >> > There are a couple things to keep in mind about the story here: >> > - they first evaluated HBase 3 years ago. HBase at that point was not >> > usable for their application - I think several of us here remember the >> > state of HBase at the time and might have made the same decision. So, >> > they started their own project with an internal team of 5-6 people. >> > - contributing to open source from within the NSA is not easy, for >> > obvious reasons. They've jumped through many hoops to open source >> > this, and we should be thankful for that. Now that they're out in open >> > source land, I think we'll see them collaborating with us much more >> > openly. >> > >> > I for one look forward to working with these folks, and maybe merging >> > the projects some time down the road as the feature lists converge. >> > >> > -Todd >> > >> > On Fri, Sep 2, 2011 at 11:40 AM, Gary Helmling <[EMAIL PROTECTED]> >> wrote: >> >> Some comments on the proposal and differentiation vs HBase: >> >> >> >> Access Labels: >> >> >> >> The proposal claims that this is "unlikely to be adopted [in HBase]". >> This >> >> is completely untrue. This has been discussed many times in the past in >> >> relation to our security implementation. It's just been deferred at the >> >> moment due to a need to focus on the initial implementation. But it's >> >> certainly viewed as a potentially important feature for a future >> iteration. >> >> Contributions always welcome! >> >> >> >> see HBASE-3435: Provide per-column-qualifier and per-key-value security >> for >> >> HBASE-3025 >> >> >> >> >> >> Iterators: >> >> >> >> What do these provide that RegionObservers don't? I'm speculating since >> the >> >> proposal provides little in the way of details, but if these are Joseph Echeverria Cloudera, Inc. 443.305.9434 +
Joey Echeverria 2011-09-02, 19:40
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalAndrew Purtell 2011-09-03, 02:06
> I think there is precedence for competing and/or "duplicate" Apache
> projects, Avro/Thrift and HBase/Cassandra come to mind. That argument isn't helping you make your case. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) >________________________________ >From: Joey Echeverria <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Sent: Saturday, September 3, 2011 3:30 AM >Subject: Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal > >To add to what Todd said, I actually worked with those guys for the >last 3 years and have used Accumulo in production. It's true that it >would have been better if they had been able to contribute to HBase >rather than go on their own, but it's not easy to contribute to open >source, either officially or unofficially when you work at NSA. I >think there is precedence for competing and/or "duplicate" Apache >projects, Avro/Thrift and HBase/Cassandra come to mind. I'm mostly >interested in this project setting a precedent for other work at NSA >to be developed as open source. > >-Joey > >On Fri, Sep 2, 2011 at 3:09 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: >> Hey folks, >> >> <wearing my Todd hat and not my Cloudera hat!> >> >> I've been in touch with this team for the last 18 months or so. >> They're good people, smart, and have a healthy respect for HBase and >> our team. Though they haven't contributed code or participated on the >> lists, I can vouch that they do follow our development and generally >> do understand HBase as well as what makes their system different. In >> the context of the incubator proposal, they're trying to explain why >> their system is different than HBase, and not trying to knock our >> project. They do borrow our ideas, and in the future we'll be able to >> borrow some of theirs. Iterator trees, for example, are distinct from >> coprocessors and have some really nice capabilities which I'm looking >> forward to adapting into HBase. >> >> There are a couple things to keep in mind about the story here: >> - they first evaluated HBase 3 years ago. HBase at that point was not >> usable for their application - I think several of us here remember the >> state of HBase at the time and might have made the same decision. So, >> they started their own project with an internal team of 5-6 people. >> - contributing to open source from within the NSA is not easy, for >> obvious reasons. They've jumped through many hoops to open source >> this, and we should be thankful for that. Now that they're out in open >> source land, I think we'll see them collaborating with us much more >> openly. >> >> I for one look forward to working with these folks, and maybe merging >> the projects some time down the road as the feature lists converge. >> >> -Todd >> >> On Fri, Sep 2, 2011 at 11:40 AM, Gary Helmling <[EMAIL PROTECTED]> wrote: >>> Some comments on the proposal and differentiation vs HBase: >>> >>> Access Labels: >>> >>> The proposal claims that this is "unlikely to be adopted [in HBase]". This >>> is completely untrue. This has been discussed many times in the past in >>> relation to our security implementation. It's just been deferred at the >>> moment due to a need to focus on the initial implementation. But it's >>> certainly viewed as a potentially important feature for a future iteration. >>> Contributions always welcome! >>> >>> see HBASE-3435: Provide per-column-qualifier and per-key-value security for >>> HBASE-3025 >>> >>> >>> Iterators: >>> >>> What do these provide that RegionObservers don't? I'm speculating since the >>> proposal provides little in the way of details, but if these are "unlikely >>> to be adopted" it's only because coprocessors already offer more extensive >>> functionality. >>> >>> >>> "Flexibility" aka online schema changes and locality groups >>> >>> Locality groups seem to be the only meaningful differentiation in this >>> entire comparison. +
Andrew Purtell 2011-09-03, 02:06
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalGary Helmling 2011-09-02, 19:55
>
> I've been in touch with this team for the last 18 months or so. > They're good people, smart, and have a healthy respect for HBase and > our team. Though they haven't contributed code or participated on the > lists, I can vouch that they do follow our development and generally > do understand HBase as well as what makes their system different. In > the context of the incubator proposal, they're trying to explain why > their system is different than HBase, and not trying to knock our > project. > Sure, I'm not trying to knock the people or the project. I am trying to point out what I view as some inaccuracies in the comparison vs. HBase. But I haven't seen the code, so I can only speculate. There are a couple things to keep in mind about the story here: > - they first evaluated HBase 3 years ago. HBase at that point was not > usable for their application - I think several of us here remember the > state of HBase at the time and might have made the same decision. So, > they started their own project with an internal team of 5-6 people. > I can sympathize, but I don't see how that's relevant to the discussion of whether it's a good idea to incubate this project now. > - contributing to open source from within the NSA is not easy, for > obvious reasons. They've jumped through many hoops to open source > this, and we should be thankful for that. Now that they're out in open > source land, I think we'll see them collaborating with us much more > openly. > > Again I can sympathize, and if I didn't make it clear before, I applaud the efforts to open source this project! But the question at hand is not whether it's good to open source the project, but whether it's appropriate for incubation at the ASF. And honestly, I don't think where the project is coming from should play into that decision. I'm simply pointing out a lack of community involvement to date. Maybe the involvement is different with the other Apache projects cited, or maybe that will change during incubation. But to the extent that collaboration is encouraged, I think the lack of involvement to date is relevant. Gary +
Gary Helmling 2011-09-02, 19:55
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalAndrew Purtell 2011-09-03, 02:13
> I'm simply pointing out a lack of community involvement to date.
I would only add to this that the incubation proposal makes a controversial statement regarding existing involvement with the HBase community. It may be technically true if a certain company with involvement in HBase has also been interacting with "Accumulo", but is disingenuous to claim that the "community" has been involved here. It looks like strictly a one way street: They have been able to observe or borrow the fruits of our labor for years, and now at a suitable point wish to incubate at the ASF to compete with our project for community. That is not "community involvement". That is leeching. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) >________________________________ >From: Gary Helmling <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Sent: Saturday, September 3, 2011 3:55 AM >Subject: Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal > >> >> I've been in touch with this team for the last 18 months or so. >> They're good people, smart, and have a healthy respect for HBase and >> our team. Though they haven't contributed code or participated on the >> lists, I can vouch that they do follow our development and generally >> do understand HBase as well as what makes their system different. In >> the context of the incubator proposal, they're trying to explain why >> their system is different than HBase, and not trying to knock our >> project. >> > >Sure, I'm not trying to knock the people or the project. I am trying to >point out what I view as some inaccuracies in the comparison vs. HBase. But >I haven't seen the code, so I can only speculate. > >There are a couple things to keep in mind about the story here: >> - they first evaluated HBase 3 years ago. HBase at that point was not >> usable for their application - I think several of us here remember the >> state of HBase at the time and might have made the same decision. So, >> they started their own project with an internal team of 5-6 people. >> > >I can sympathize, but I don't see how that's relevant to the discussion of >whether it's a good idea to incubate this project now. > > >> - contributing to open source from within the NSA is not easy, for >> obvious reasons. They've jumped through many hoops to open source >> this, and we should be thankful for that. Now that they're out in open >> source land, I think we'll see them collaborating with us much more >> openly. >> >> >Again I can sympathize, and if I didn't make it clear before, I applaud the >efforts to open source this project! But the question at hand is not >whether it's good to open source the project, but whether it's appropriate >for incubation at the ASF. And honestly, I don't think where the project is >coming from should play into that decision. > >I'm simply pointing out a lack of community involvement to date. Maybe the >involvement is different with the other Apache projects cited, or maybe that >will change during incubation. But to the extent that collaboration is >encouraged, I think the lack of involvement to date is relevant. > > >Gary > > > +
Andrew Purtell 2011-09-03, 02:13
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalStack 2011-09-03, 05:17
I'm pissed off my tax dollars were being used to duplicate the effort
we expend each day up here in hbaselandia! Some one of us needs to bottle some of the above as a response over in incubator. My thought is that we'd be doing all involved a service getting them to tighten up the proposal. I'd think the response would run something like we welcome the code drop, that we appreciate the monumental effort it must have taken making an NSA sponsored project open source, and that we wish them luck getting folks to run software written by the NSA (joke!), but we have some 'feedback' on the proposal as written, in particular around its references to our project. I think Gary's list citing issues/code in hbase that we are unlikely to do would be good to include in the response and that we'd question their claiming strong relationship with hbase when it seems the relationship was one way only. I think Ted's question on how NSA will do OSS going forward when its been a problem up to this is an interesting one but I'd expect that it'll likely be asked by those over in incubator since it so obvious. What ye reckon? I can up a response but maybe Gary, since I'd mostly be recasting your first response above, you want to do it? St.Ack On Fri, Sep 2, 2011 at 7:13 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> I'm simply pointing out a lack of community involvement to date. > > > I would only add to this that the incubation proposal makes a controversial statement regarding existing involvement with the HBase community. It may be technically true if a certain company with involvement in HBase has also been interacting with "Accumulo", but is disingenuous to claim that the "community" has been involved here. > > It looks like strictly a one way street: They have been able to observe or borrow the fruits of our labor for years, and now at a suitable point wish to incubate at the ASF to compete with our project for community. That is not "community involvement". That is leeching. > > > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) > > >>________________________________ >>From: Gary Helmling <[EMAIL PROTECTED]> >>To: [EMAIL PROTECTED] >>Sent: Saturday, September 3, 2011 3:55 AM >>Subject: Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal >> >>> >>> I've been in touch with this team for the last 18 months or so. >>> They're good people, smart, and have a healthy respect for HBase and >>> our team. Though they haven't contributed code or participated on the >>> lists, I can vouch that they do follow our development and generally >>> do understand HBase as well as what makes their system different. In >>> the context of the incubator proposal, they're trying to explain why >>> their system is different than HBase, and not trying to knock our >>> project. >>> >> >>Sure, I'm not trying to knock the people or the project. I am trying to >>point out what I view as some inaccuracies in the comparison vs. HBase. But >>I haven't seen the code, so I can only speculate. >> >>There are a couple things to keep in mind about the story here: >>> - they first evaluated HBase 3 years ago. HBase at that point was not >>> usable for their application - I think several of us here remember the >>> state of HBase at the time and might have made the same decision. So, >>> they started their own project with an internal team of 5-6 people. >>> >> >>I can sympathize, but I don't see how that's relevant to the discussion of >>whether it's a good idea to incubate this project now. >> >> >>> - contributing to open source from within the NSA is not easy, for >>> obvious reasons. They've jumped through many hoops to open source >>> this, and we should be thankful for that. Now that they're out in open >>> source land, I think we'll see them collaborating with us much more >>> openly. >>> >>> >>Again I can sympathize, and if I didn't make it clear before, I applaud the +
Stack 2011-09-03, 05:17
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalBernd Fondermann 2011-09-03, 07:00
On Saturday, September 3, 2011, Andrew Purtell <[EMAIL PROTECTED]> wrote:
>> I'm simply pointing out a lack of community involvement to date. > > > I would only add to this that the incubation proposal makes a controversial statement regarding existing involvement with the HBase community. It may be technically true if a certain company with involvement in HBase has also been interacting with "Accumulo", but is disingenuous to claim that the "community" has been involved here. > > It looks like strictly a one way street: They have been able to observe or borrow the fruits of our labor for years, and now at a suitable point wish to incubate at the ASF to compete with our project for community. That is not "community involvement". That is leeching. are you saying that the proposal is actually some kind of HBase fork? And, isn't this 'competition' already happening between all the BT and Dynamo implementations? I fail to see anything bad happening here. Bernd +
Bernd Fondermann 2011-09-03, 07:00
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalRyan Rawson 2011-09-03, 07:17
My understanding is that the ASF is about community, not code. So what
is the goal for Accumulo? Build a community. How much would it intersect with the HBase community? Sounds like a lot. Does it still make sense to incubate it then? To the point earlier that ASF has hosted multiple competitors of various core projects, notably httpd, I had a look, there is exactly 2 projects that serve HTTP exclusively: Apache HTTPD Apache Traffic Server But these 2 are complementary, although some features kind of overlap (mod_proxy for eg), they dont really compete directly. So, would the ASF allow incubation of a web server product, for example nginx (which is a direct httpd competitor)? If the answer is "no either work with the httpd community or go elsewhere", then sure Accumulo should have the same treatment? -ryan On Sat, Sep 3, 2011 at 12:00 AM, Bernd Fondermann <[EMAIL PROTECTED]> wrote: > On Saturday, September 3, 2011, Andrew Purtell <[EMAIL PROTECTED]> wrote: >>> I'm simply pointing out a lack of community involvement to date. >> >> >> I would only add to this that the incubation proposal makes a > controversial statement regarding existing involvement with the HBase > community. It may be technically true if a certain company with involvement > in HBase has also been interacting with "Accumulo", but is disingenuous to > claim that the "community" has been involved here. >> >> It looks like strictly a one way street: They have been able to observe or > borrow the fruits of our labor for years, and now at a suitable point wish > to incubate at the ASF to compete with our project for community. That is > not "community involvement". That is leeching. > > are you saying that the proposal is actually some kind of HBase fork? > > And, isn't this 'competition' already happening between all the BT and > Dynamo implementations? > > I fail to see anything bad happening here. > > Bernd > +
Ryan Rawson 2011-09-03, 07:17
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalBernd Fondermann 2011-09-03, 07:55
On Saturday, September 3, 2011, Ryan Rawson <[EMAIL PROTECTED]> wrote:
> My understanding is that the ASF is about community, not code. So what > is the goal for Accumulo? Build a community. How much would it > intersect with the HBase community? Sounds like a lot. Does it still > make sense to incubate it then? > > To the point earlier that ASF has hosted multiple competitors of > various core projects, notably httpd, I had a look, there is exactly 2 > projects that serve HTTP exclusively: > Apache HTTPD > Apache Traffic Server > > But these 2 are complementary, although some features kind of overlap > (mod_proxy for eg), they dont really compete directly. You ommitted Tomcat which is all about HTTP. They are not the same, yet compete for web server users, i.e. Community. > > So, would the ASF allow incubation of a web server product, for > example nginx (which is a direct httpd competitor)? Yes, why not? You are assuming: - competition is bad - community is impartibel Both do not hold IMHO. For example there are people active in H, HBase and Cassandra at the same time, even bringing them more together. We have at least one person who is committer for Tomcat and TrafficServer and HTTP. i really wouldn't worry too much. Bernd > If the answer is > "no either work with the httpd community or go elsewhere", then sure > Accumulo should have the same treatment? > > -ryan > > > On Sat, Sep 3, 2011 at 12:00 AM, Bernd Fondermann > <[EMAIL PROTECTED]> wrote: >> On Saturday, September 3, 2011, Andrew Purtell <[EMAIL PROTECTED]> wrote: >>>> I'm simply pointing out a lack of community involvement to date. >>> >>> >>> I would only add to this that the incubation proposal makes a >> controversial statement regarding existing involvement with the HBase >> community. It may be technically true if a certain company with involvement >> in HBase has also been interacting with "Accumulo", but is disingenuous to >> claim that the "community" has been involved here. >>> >>> It looks like strictly a one way street: They have been able to observe or >> borrow the fruits of our labor for years, and now at a suitable point wish >> to incubate at the ASF to compete with our project for community. That is >> not "community involvement". That is leeching. >> >> are you saying that the proposal is actually some kind of HBase fork? >> >> And, isn't this 'competition' already happening between all the BT and >> Dynamo implementations? >> >> I fail to see anything bad happening here. >> >> Bernd >> > +
Bernd Fondermann 2011-09-03, 07:55
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalRyan Rawson 2011-09-03, 09:19
Note that even though you list tomcat as a 'competitor' to apache, one
would never really choose between one or the other for a given task. But people might easily choose Accumulo over HBase for the exact same task. And they would be probably not choose both. Just thinking out loud. I of course want HBase to be highly successful for personal reasons. So just wandering thru the ways in which this might or might not be beneficial to HBase. On Sat, Sep 3, 2011 at 12:55 AM, Bernd Fondermann <[EMAIL PROTECTED]> wrote: > On Saturday, September 3, 2011, Ryan Rawson <[EMAIL PROTECTED]> wrote: >> My understanding is that the ASF is about community, not code. So what >> is the goal for Accumulo? Build a community. How much would it >> intersect with the HBase community? Sounds like a lot. Does it still >> make sense to incubate it then? >> >> To the point earlier that ASF has hosted multiple competitors of >> various core projects, notably httpd, I had a look, there is exactly 2 >> projects that serve HTTP exclusively: >> Apache HTTPD >> Apache Traffic Server >> >> But these 2 are complementary, although some features kind of overlap >> (mod_proxy for eg), they dont really compete directly. > > You ommitted Tomcat which is all about HTTP. They are not the same, yet > compete for web server users, i.e. Community. > >> >> So, would the ASF allow incubation of a web server product, for >> example nginx (which is a direct httpd competitor)? > > Yes, why not? > You are assuming: > - competition is bad > - community is impartibel > > Both do not hold IMHO. > For example there are people active in H, HBase and Cassandra at the same > time, even bringing them more together. > We have at least one person who is committer for Tomcat and TrafficServer > and HTTP. > > i really wouldn't worry too much. > > Bernd > >> If the answer is >> "no either work with the httpd community or go elsewhere", then sure >> Accumulo should have the same treatment? >> >> -ryan >> >> >> On Sat, Sep 3, 2011 at 12:00 AM, Bernd Fondermann >> <[EMAIL PROTECTED]> wrote: >>> On Saturday, September 3, 2011, Andrew Purtell <[EMAIL PROTECTED]> > wrote: >>>>> I'm simply pointing out a lack of community involvement to date. >>>> >>>> >>>> I would only add to this that the incubation proposal makes a >>> controversial statement regarding existing involvement with the HBase >>> community. It may be technically true if a certain company with > involvement >>> in HBase has also been interacting with "Accumulo", but is disingenuous > to >>> claim that the "community" has been involved here. >>>> >>>> It looks like strictly a one way street: They have been able to observe > or >>> borrow the fruits of our labor for years, and now at a suitable point > wish >>> to incubate at the ASF to compete with our project for community. That is >>> not "community involvement". That is leeching. >>> >>> are you saying that the proposal is actually some kind of HBase fork? >>> >>> And, isn't this 'competition' already happening between all the BT and >>> Dynamo implementations? >>> >>> I fail to see anything bad happening here. >>> >>> Bernd >>> >> > +
Ryan Rawson 2011-09-03, 09:19
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalDoug Meil 2011-09-03, 13:40
That's a good point. But the fact that Apache HTTPD is implemented in C, and Tomcat is in Java I think is material. While they compete for web server users, they have very different implementations. If ASF had two active Java webserver communities and codebases (e.g., "Tomcat" and "Timdog") I think that it would be a bit confused. On 9/3/11 3:55 AM, "Bernd Fondermann" <[EMAIL PROTECTED]> wrote: > >You ommitted Tomcat which is all about HTTP. They are not the same, yet >compete for web server users, i.e. Community. > +
Doug Meil 2011-09-03, 13:40
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalStack 2011-09-03, 20:21
On Sat, Sep 3, 2011 at 12:55 AM, Bernd Fondermann
<[EMAIL PROTECTED]> wrote: >> So, would the ASF allow incubation of a web server product, for >> example nginx (which is a direct httpd competitor)? > > Yes, why not? > You are assuming: > - competition is bad > - community is impartibel > Can we move the 'competition is good', etc. discussion to another thread preferably off the dev list. I'd suggest we refocus this thread on how to respond to the Accumulo proposal (or whether to respond at all), since thats what we 'know'. I think it'd be useful correcting at least the 'unlikely tos' with pointers to committed code. Code overlap, if any, can be addressed when the code drop happens. St.Ack +
Stack 2011-09-03, 20:21
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalAndrew Purtell 2011-09-04, 10:22
> I think it'd be useful correcting at least the 'unlikely tos' with
> pointers to committed code. I would agree. Trend Micro has offered to the HBase community, for several months now, a version of HBase with Secure Hadoop integration and functioning table- and column-family ACLs: http://github.com/trendmicro/hbase/tree/security We run a 0.90-ish version of this in our production. The implementation is implemented as a coprocessor, but requires the coprocessor framework and pluggable RPC (and the secure RPC engine) to be included into core. So far we have successfully upstreamed the coprocessor framework, into 0.92. Inclusion of pluggable RPC (and the secure RPC engine) into 0.92 could happen, but my understanding is the PMC consensus is pluggable RPC, and the Maven build refactoring necessary, would delay the release unacceptably. So this has been "accepted" in principle for 0.94. I put 'accepted' in quotes because I can't say there is any commitment to do this. I think we can respond that a secure HBase would be a reality if our contributions are accepted in full. It would be a stronger response if the immediate next release (almost ready) were to have fully functioning security features. On the other hand, I think the Accumulo proposal is not inaccurate if our contributions will not be accepted for some reason. So we should have this discussion, before responding. > Code overlap, if any, can be addressed when the code drop happens. I for one am eagerly awaiting the code. Only after this will there be enough information on hand to address some of the questions we have. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) >________________________________ >From: Stack <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Sent: Sunday, September 4, 2011 4:21 AM >Subject: Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal > >On Sat, Sep 3, 2011 at 12:55 AM, Bernd Fondermann ><[EMAIL PROTECTED]> wrote: >>> So, would the ASF allow incubation of a web server product, for >>> example nginx (which is a direct httpd competitor)? >> >> Yes, why not? >> You are assuming: >> - competition is bad >> - community is impartibel >> > >Can we move the 'competition is good', etc. discussion to another >thread preferably off the dev list. > >I'd suggest we refocus this thread on how to respond to the Accumulo >proposal (or whether to respond at all), since thats what we 'know'. >I think it'd be useful correcting at least the 'unlikely tos' with >pointers to committed code. > >Code overlap, if any, can be addressed when the code drop happens. > >St.Ack > > > +
Andrew Purtell 2011-09-04, 10:22
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalStack 2011-09-04, 22:34
On Sun, Sep 4, 2011 at 3:22 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote:
> The implementation is implemented as a coprocessor, but requires the coprocessor framework and pluggable RPC (and the secure RPC engine) to be included into core. So far we have successfully upstreamed the coprocessor framework, into 0.92. Inclusion of pluggable RPC (and the secure RPC engine) into 0.92 could happen, but my understanding is the PMC consensus is pluggable RPC, and the Maven build refactoring necessary, would delay the release unacceptably. So this has been "accepted" in principle for 0.94. I put 'accepted' in quotes because I can't say there is any commitment to do this. > I do not know of any such PMC consensus. Its not been discussed nor voted on up on private (You are on the PMC so you'd know). > So we should have this discussion, before responding. > I'm not that interested in discussing because as I see it, there is nothing to discuss. We need the security you fellas have been working on. Seems like committing it will disrupt the build and src tree layout. Gary was holding off till we branched but 0.92 branching is taking too long. + Lets branch this friday, or next? + And or, run a vote on whether we should commit security now before we branch or after St.Ack +
Stack 2011-09-04, 22:34
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalGary Helmling 2011-09-06, 19:02
> Seems like committing it will disrupt the build and src tree layout.
> Gary was holding off till we branched but 0.92 branching is taking too > long. > > + Lets branch this friday, or next? > + And or, run a vote on whether we should commit security now before > we branch or after > > This is getting off topic for the current thread, so I'll open a new thread to take a vote on converting trunk back in to maven modules. This is what would be necessary to integrate the various security bits. The last discussion we had on this was on the dev list at the end of May/beginning of June: http://search-hadoop.com/m/iXZmd2aZwBE1 I agreed as much as anyone that we should hold off until after branching 0.92 in order to avoid the disruption of moving the entire source tree around. So I have been holding off on this on my own discretion and any delay sits mostly with me. Of course, that was three months ago and we still haven't branched. In hindsight, if we were aware how long the 0.92 process would go on, I think the thread might have reached a different conclusion. In any case, I think it warrants another discussion. --gh +
Gary Helmling 2011-09-06, 19:02
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalDuane Moore 2011-09-06, 16:21
Hello all,
I've been a lurker on the HBase list for a year or so and our company has also been working with the Accumulo implementation during the same time frame. I'd like to respond to Stack's suggestion to focus on the technical merits of the proposal. Since I have some info on the pre-open sourced version of Accumulo, I'd like to share some of our evaluation of the software, primarily from a client perspective (vs. implementation details like logging to NFS vs HDFS). First, I share many of the same concerns of folks who were frustrated that this project seems to duplicate the effort of the open source (particularly HBase) community. However, I will second what Todd and Joey said and reiterate that contributing to open source is not easy for a government contractor, and especially not easy for U.S. government employees. My personal preference for a long while has been to migrate our Accumulo implementation to HBase, but as with any project there are often non-technical considerations for doing so. Below are some notes we took last year on the differences between Accumulo and HBase, with additional notes from me inline. Much of this mirrors what is in the current Accumulo proposal. ----- - Column Families In HBase you must specify all column families up front as part of the table schema declaration when creating a table. Accumulo does not have this restriction, you do not declare column families when you create a table. When you insert a new row into the table you can just provide a new column family. ** Note: sounds like from what Stack said, this is close to being OBE? - Aggregation Accumulo offers the ability to specify an aggregator for an individual column family or column. This allows you to keep a row count, or summation of numerical values that may be stored in a particular column. It would appear the function has to operate on the subset of values stored for that column in the table at a particular time since it keeps the aggregate value in memory. So this may not be able to handle certain aggregation functions like 'median' for instance. But functions like sum, max, min, mean, and count should all be supportable. I could not find a comparable feature within HBase, but HBase does offer an atomic function called incremementColumnValue on the HTable class which appears can be leveraged to provide aggregation behavior. - Column Visibility This is the feature in Accumulo that allows tagging of the data at the column level, which would primarily be used for classification markings (in our scenario). If we were to implement the same type of column visibility in HBase that Accumulo supports, we would have potentially several options: -Try to implement column visibility as a patch to HBase. Would be fun, but may be a lot of work. -Since the value of a particular column (cell, actually) is simply a byte array, we could utilize a standard technique of encoding the visibility level/classification in the column value itself. -Since the number of columns is not pre-defined, adopt a convention whereby each column "foo" gets an additional column added by our infrastructure called "foo_visibility". ** Note: We have a requirement to use PKI (digital certificates) for authentication in our service stack. The relationship between PKI and Kerberos currently used for Secure HBase is interesting; not quite sure how the two would fit together in practice. -Retrieving Data Accumulo uses a Scanner object for all retrieval operations, which are instantiated by retrieving a Scanner from the Connector object. When retrieving all values for a particular row, the _individual cells are returned as a new entry_ returned by the Scanner iterator. In HBase, you can use a Scan object (org.apache.hadoop.hbase.client.Scan) or you can use a Get object, which allows you to retrieve a single row at a time. In either case, the org.apache.hadoop.hbase.client.Result class is returned, representing all of the requested data for that particular row. In HBase, to set constraints on a query, you set a org.apache.hadoop.hbase.filter.Filter object on the Scan object. Multiple Filters may be set by using the FilterList object. In Accumulo, you call the setScanIterators() method on the Scanner object, which enables the appropriate iterators for use on the server before returning data. ** Note: primary difference here is in the use of server-side iterators, which Andy has correctly pointed out could be implemented via the coprocessor framework. We did some initial investigation into coprocessors to see if we could implement this equivalent functionality, but since we'd been directed to use Accumulo, we didn't have much bandwidth to address this (also coprocessors were in their infancy at the time). Hope that helps. Bottom line is that I believe that the features in Accumulo can and ought to be merged into HBase at some point (assuming the technical merits hold up). Looking forward to contributing to that conversation. Thanks, Duane On 9/3/11 2:21 PM, "Stack" <[EMAIL PROTECTED]> wrote: +
Duane Moore 2011-09-06, 16:21
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalTed Dunning 2011-09-06, 16:31
Hbase offers co-processors which should be able to do this.
And median *can* be accumulated in a small amount of memory. It is a little trickier than mean, but still doable. On Tue, Sep 6, 2011 at 11:21 AM, Duane Moore <[EMAIL PROTECTED]> wrote: > - Aggregation > Accumulo offers the ability to specify an aggregator for an individual > column family or column. This allows you to keep a row count, or summation > of numerical values that may be stored in a particular column. It would > appear the function has to operate on the subset of values stored for that > column in the table at a particular time since it keeps the aggregate > value in memory. So this may not be able to handle certain aggregation > functions like 'median' for instance. But functions like sum, max, min, > mean, and count should all be supportable. > I could not find a comparable feature within HBase, but HBase does offer > an atomic function called incremementColumnValue on the HTable class which > appears can be leveraged to provide aggregation behavior. > +
Ted Dunning 2011-09-06, 16:31
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalStack 2011-09-06, 16:58
Thanks for the below Duane. Helps.
See below. On Tue, Sep 6, 2011 at 9:21 AM, Duane Moore <[EMAIL PROTECTED]> wrote: ... > - Column Families > In HBase you must specify all column families up front as part of the > table schema declaration when creating a table. > Accumulo does not have this restriction, you do not declare column > families when you create a table. When you insert a new row into the table > you can just provide a new column family. > ** Note: sounds like from what Stack said, this is close to being OBE? > I'm about to get an Order of the British Empire (http://en.wikipedia.org/wiki/Order_of_the_British_Empire)!!! Yeah, I think Overtaken By Events seems about right. Having the client free form add column families seems like a bad idea to me. There should be some friction since there are physical impliciations each time a new CF is added. But then Accumulo has a form of locality groups so it seems to me that this freeform adding of column families is just something that follows on from their having locatlity groups (I wonder how you do locality group editing in Accumulo? Do you have to take the table offline?) > - Aggregation > Accumulo offers the ability to specify an aggregator for an individual > column family or column. This allows you to keep a row count, or summation > of numerical values that may be stored in a particular column. It would > appear the function has to operate on the subset of values stored for that > column in the table at a particular time since it keeps the aggregate > value in memory. So this may not be able to handle certain aggregation > functions like 'median' for instance. But functions like sum, max, min, > mean, and count should all be supportable. > I could not find a comparable feature within HBase, but HBase does offer > an atomic function called incremementColumnValue on the HTable class which > appears can be leveraged to provide aggregation behavior. > Yeah, we have ICVs and you can aggregate outside of HBase in the client but it sounds like the above is a subset of https://issues.apache.org/jira/browse/HBASE-1512, committed to TRUNK? > - Column Visibility > This is the feature in Accumulo that allows tagging of the data at the > column level, which would primarily be used for classification markings > (in our scenario). > If we were to implement the same type of column visibility in HBase that > Accumulo supports, we would have potentially several options: > -Try to implement column visibility as a patch to HBase. Would be fun, but > may be a lot of work. > -Since the value of a particular column (cell, actually) is simply a byte > array, we could utilize a standard technique of encoding the visibility > level/classification in the column value itself. > -Since the number of columns is not pre-defined, adopt a convention > whereby each column "foo" gets an additional column added by our > infrastructure called "foo_visibility". > ** Note: We have a requirement to use PKI (digital certificates) for > authentication in our service stack. The relationship between PKI and > Kerberos currently used for Secure HBase is interesting; not quite sure > how the two would fit together in practice. > We'd entertain #1 (Gary above cites an issue where he ruminates on what would be involved: https://issues.apache.org/jira/browse/HBASE-3435). I don't get why this has to be in the KV rather than as a version of #2 (but hey, I'm slow). #3 sounds a little messy. #4 sounds like the proper way to get per user auth. I'd be interested in helping out getting that to work. > -Retrieving Data > Accumulo uses a Scanner object for all retrieval operations, which are > instantiated by retrieving a Scanner from the Connector object. When > retrieving all values for a particular row, the _individual cells are > returned as a new entry_ returned by the Scanner iterator. > In HBase, you can use a Scan object (org.apache.hadoop.hbase.client.Scan) > or you can use a Get object, which allows you to retrieve a single row at Yeah, sounds like it. Thanks for the helpful note Duane. Good stuff, St.Ack +
Stack 2011-09-06, 16:58
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalAndrew Purtell 2011-09-09, 18:50
> From: Duane Moore <[EMAIL PROTECTED]>
> I will second what Todd and Joey > said and reiterate that contributing to open source is not easy for a > government contractor, and especially not easy for U.S. government > employees. This is true as a general statement I'm sure. However, my former life was as an engineer in a DARPA shop with a TS clearance. During that time I worked on both closed/classified systems and projects such as TrustedBSD (http://www.trustedbsd.org/). Choosing to develop an internal alternative rather than work with the HBase project was a decision of convenience by someone. While all appreciate this eventual open sourcing on some level, the outcome is hardly optimal, and does not favor in my opinion the existing open source community here (HBase) in the short term, and any long term favor is going to require work by that community. > My personal preference for a long while has been to migrate > our Accumulo implementation to HBase, but as with any project there are > often non-technical considerations for doing so. I can only hope that open source communities in general will apply a penalty for taking the easy way out for such non-technical considerations. We do not have to act as beggars. Presumably this open sourcing was not done out of charity -- I would be quite surprised, maybe shocked. If government (or contractors) want to leverage open source communities for some benefit, the least we can do is insist on respectful terms. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) ----- Original Message ----- > From: Duane Moore <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Cc: > Sent: Tuesday, September 6, 2011 9:21 AM > Subject: Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal > > Hello all, > > I've been a lurker on the HBase list for a year or so and our company has > also been working with the Accumulo implementation during the same time > frame. I'd like to respond to Stack's suggestion to focus on the > technical merits of the proposal. Since I have some info on the pre-open > sourced version of Accumulo, I'd like to share some of our evaluation of > the software, primarily from a client perspective (vs. implementation > details like logging to NFS vs HDFS). > > First, I share many of the same concerns of folks who were frustrated that > this project seems to duplicate the effort of the open source > (particularly HBase) community. However, I will second what Todd and Joey > said and reiterate that contributing to open source is not easy for a > government contractor, and especially not easy for U.S. government > employees. My personal preference for a long while has been to migrate > our Accumulo implementation to HBase, but as with any project there are > often non-technical considerations for doing so. > > Below are some notes we took last year on the differences between Accumulo > and HBase, with additional notes from me inline. Much of this mirrors > what is in the current Accumulo proposal. > > ----- > > - Column Families > In HBase you must specify all column families up front as part of the > table schema declaration when creating a table. > Accumulo does not have this restriction, you do not declare column > families when you create a table. When you insert a new row into the table > you can just provide a new column family. > ** Note: sounds like from what Stack said, this is close to being OBE? > > > - Aggregation > Accumulo offers the ability to specify an aggregator for an individual > column family or column. This allows you to keep a row count, or summation > of numerical values that may be stored in a particular column. It would > appear the function has to operate on the subset of values stored for that > column in the table at a particular time since it keeps the aggregate > value in memory. So this may not be able to handle certain aggregation +
Andrew Purtell 2011-09-09, 18:50
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalBradford Stephens 2011-09-09, 19:24
Accumulo seems mostly like features we can roll into HBase. Decline.
On Fri, Sep 9, 2011 at 2:50 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> From: Duane Moore <[EMAIL PROTECTED]> > >> I will second what Todd and Joey >> said and reiterate that contributing to open source is not easy for a >> government contractor, and especially not easy for U.S. government >> employees. > > > This is true as a general statement I'm sure. > > However, my former life was as an engineer in a DARPA shop with a TS clearance. During that time I worked on both closed/classified systems and projects such as TrustedBSD (http://www.trustedbsd.org/). Choosing to develop an internal alternative rather than work with the HBase project was a decision of convenience by someone. > > While all appreciate this eventual open sourcing on some level, the outcome is hardly optimal, and does not favor in my opinion the existing open source community here (HBase) in the short term, and any long term favor is going to require work by that community. > >> My personal preference for a long while has been to migrate >> our Accumulo implementation to HBase, but as with any project there are >> often non-technical considerations for doing so. > > > I can only hope that open source communities in general will apply a penalty for taking the easy way out for such non-technical considerations. We do not have to act as beggars. Presumably this open sourcing was not done out of charity -- I would be quite surprised, maybe shocked. If government (or contractors) want to leverage open source communities for some benefit, the least we can do is insist on respectful terms. > > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) > > > ----- Original Message ----- >> From: Duane Moore <[EMAIL PROTECTED]> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >> Cc: >> Sent: Tuesday, September 6, 2011 9:21 AM >> Subject: Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal >> >> Hello all, >> >> I've been a lurker on the HBase list for a year or so and our company has >> also been working with the Accumulo implementation during the same time >> frame. I'd like to respond to Stack's suggestion to focus on the >> technical merits of the proposal. Since I have some info on the pre-open >> sourced version of Accumulo, I'd like to share some of our evaluation of >> the software, primarily from a client perspective (vs. implementation >> details like logging to NFS vs HDFS). >> >> First, I share many of the same concerns of folks who were frustrated that >> this project seems to duplicate the effort of the open source >> (particularly HBase) community. However, I will second what Todd and Joey >> said and reiterate that contributing to open source is not easy for a >> government contractor, and especially not easy for U.S. government >> employees. My personal preference for a long while has been to migrate >> our Accumulo implementation to HBase, but as with any project there are >> often non-technical considerations for doing so. >> >> Below are some notes we took last year on the differences between Accumulo >> and HBase, with additional notes from me inline. Much of this mirrors >> what is in the current Accumulo proposal. >> >> ----- >> >> - Column Families >> In HBase you must specify all column families up front as part of the >> table schema declaration when creating a table. >> Accumulo does not have this restriction, you do not declare column >> families when you create a table. When you insert a new row into the table >> you can just provide a new column family. >> ** Note: sounds like from what Stack said, this is close to being OBE? >> >> >> - Aggregation >> Accumulo offers the ability to specify an aggregator for an individual >> column family or column. This allows you to keep a row count, or summation >> of numerical values that may be stored in a particular column. It would Bradford Stephens, Founder, Drawn to Scale http://drawntoscale.com (530) 763-DATA http://www.drawntoscale.com -- Spire, the scalable database with real-time queries and fulltext search. http://www.roadtofailure.com -- The Fringes of Scalability, Startups and Computer Science +
Bradford Stephens 2011-09-09, 19:24
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalAmandeep Khurana 2011-09-09, 20:28
Accepting Accumulo into the incubator would be a good encouragement for the
folks at NSA to work more with open source software and engage with the communities and set a good example for future projects. That, in my mind, seems to be the strongest reason for letting the project in. However, I don't see how that helps HBase or ASF in the long run. It is true that it'll take time and effort to combine the projects right now but that might be a hit worth taking and having combined development efforts from here on as compared to having two completely independent project and later on looking at the merge. I don't see how a merge later on will be any easier than right now. The decision obviously comes down to how much effort the developers on both projects are willing to put into it right now or later on. Having said that, I think the HBase community at large needs to get an insight into Accumulo's implementation to gauge how different the two projects are in terms of the implementation details and code. Trying to come to a conclusion without doing that might not give us the best solution. I'm excited about the fact that we have an alternate implementation but that's just the engineer in me. The HBase user in me is worried about the confusion an almost ditto alternate project will create. Just my $0.02. -ak On Fri, Sep 9, 2011 at 1:50 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > > From: Duane Moore <[EMAIL PROTECTED]> > > > I will second what Todd and Joey > > said and reiterate that contributing to open source is not easy for a > > government contractor, and especially not easy for U.S. government > > employees. > > > This is true as a general statement I'm sure. > > However, my former life was as an engineer in a DARPA shop with a TS > clearance. During that time I worked on both closed/classified systems and > projects such as TrustedBSD (http://www.trustedbsd.org/). Choosing to > develop an internal alternative rather than work with the HBase project was > a decision of convenience by someone. > > While all appreciate this eventual open sourcing on some level, the outcome > is hardly optimal, and does not favor in my opinion the existing open source > community here (HBase) in the short term, and any long term favor is going > to require work by that community. > > > My personal preference for a long while has been to migrate > > our Accumulo implementation to HBase, but as with any project there are > > often non-technical considerations for doing so. > > > I can only hope that open source communities in general will apply a > penalty for taking the easy way out for such non-technical considerations. > We do not have to act as beggars. Presumably this open sourcing was not done > out of charity -- I would be quite surprised, maybe shocked. If government > (or contractors) want to leverage open source communities for some benefit, > the least we can do is insist on respectful terms. > > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > > ----- Original Message ----- > > From: Duane Moore <[EMAIL PROTECTED]> > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > Cc: > > Sent: Tuesday, September 6, 2011 9:21 AM > > Subject: Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up > on Apache Incubator as a proposal > > > > Hello all, > > > > I've been a lurker on the HBase list for a year or so and our company has > > also been working with the Accumulo implementation during the same time > > frame. I'd like to respond to Stack's suggestion to focus on the > > technical merits of the proposal. Since I have some info on the pre-open > > sourced version of Accumulo, I'd like to share some of our evaluation of > > the software, primarily from a client perspective (vs. implementation > > details like logging to NFS vs HDFS). > > > > First, I share many of the same concerns of folks who were frustrated > that > > this project seems to duplicate the effort of the open source +
Amandeep Khurana 2011-09-09, 20:28
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalAndrew Purtell 2011-09-03, 10:11
> From: Bernd Fondermann <[EMAIL PROTECTED]>
> And, isn't this 'competition' already happening between all the BT and Dynamo implementations? Yes there is, and there is clearly major architectural and implementation differences that make it worthwhile to promote such competition. Nobody disputes (that I know of) that one of BT or Dynamo is not going to be a good fit for use case X, but will be for use case Y, and so on. >From what I have heard -- and of course we are fumbling around in the dark a bit here waiting for secret code yet to be released -- there is much less distinction here between the code base proposed for incubation and HBase. I hear a rumor it borrows some HBase code directly. > I fail to see anything bad happening here. See below... > are you saying that the proposal is actually some kind of HBase fork? This question can be answered by a detailed review of both code bases side by side. Let us call this a concern, not an assertion that anything bad is happening here. I at least do not have enough information on hand to say one way or another. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) >________________________________ >From: Bernd Fondermann <[EMAIL PROTECTED]> >To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Andrew Purtell <[EMAIL PROTECTED]> >Sent: Saturday, September 3, 2011 3:00 PM >Subject: Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal > > > > >On Saturday, September 3, 2011, Andrew Purtell <[EMAIL PROTECTED]> wrote: >>> I'm simply pointing out a lack of community involvement to date. >> >> >> I would only add to this that the incubation proposal makes a controversial statement regarding existing involvement with the HBase community. It may be technically true if a certain company with involvement in HBase has also been interacting with "Accumulo", but is disingenuous to claim that the "community" has been involved here. >> >> It looks like strictly a one way street: They have been able to observe or borrow the fruits of our labor for years, and now at a suitable point wish to incubate at the ASF to compete with our project for community. That is not "community involvement". That is leeching. > >are you saying that the proposal is actually some kind of HBase fork? > >And, isn't this 'competition' already happening between all the BT and Dynamo implementations? > >I fail to see anything bad happening here. > > Bernd > > +
Andrew Purtell 2011-09-03, 10:11
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalBernd Fondermann 2011-09-02, 20:01
On Friday, September 2, 2011, Gary Helmling <[EMAIL PROTECTED]> wrote:
> > Claims a relationship with HBase. Is there overlapping code or is this just > the duplication of functionality? There's no community relationship that > I'm aware of. I haven't seen any of the proposed committers on the HBase > user and dev lists to this point, so that doesn't set much of a precedent > for community interaction. > > > Overall I see no meaningful differentiation vs HBase as an existing project, > no past attempts to interact with the most relevant Apache community, and > only an, until now, private "community" of government users. I think it's > great that they want to open source this. I don't want to discourage that > -- go for it! But I don't see what the benefit is of ASF incubating this. > I only see the potential for community fragmentation and market confusion > over such closely similar projects. Over the years, many "competing" projects went through incubation or were developped in different projects. There are at least 5 HTTP servers, two WS-* stacks, three build tools, there is Cassandra, HBase and CouchDB. No project can claim to dominate a particular technical domain. Maybe a bit surprising, this evolution of projects has fostered innovation and contributed to ASFs versatlity. The only thing you can really do is write code that rocks, build an open community and put out great releases. Bernd +
Bernd Fondermann 2011-09-02, 20:01
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalMathias Herberts 2011-09-02, 20:24
Maybe a BOF at Hadoop World could clarify some things, provided some
of those folks attend the conference. On Fri, Sep 2, 2011 at 22:01, Bernd Fondermann <[EMAIL PROTECTED]> wrote: > On Friday, September 2, 2011, Gary Helmling <[EMAIL PROTECTED]> wrote: > >> >> Claims a relationship with HBase. Is there overlapping code or is this > just >> the duplication of functionality? There's no community relationship that >> I'm aware of. I haven't seen any of the proposed committers on the HBase >> user and dev lists to this point, so that doesn't set much of a precedent >> for community interaction. >> >> >> Overall I see no meaningful differentiation vs HBase as an existing > project, >> no past attempts to interact with the most relevant Apache community, and >> only an, until now, private "community" of government users. I think it's >> great that they want to open source this. I don't want to discourage that >> -- go for it! But I don't see what the benefit is of ASF incubating this. >> I only see the potential for community fragmentation and market confusion >> over such closely similar projects. > > Over the years, many "competing" projects went through incubation or were > developped in different projects. There are at least 5 HTTP servers, two > WS-* stacks, three build tools, there is Cassandra, HBase and CouchDB. No > project can claim to dominate a particular technical domain. Maybe a bit > surprising, this evolution of projects has fostered innovation and > contributed to ASFs versatlity. > > The only thing you can really do is write code that rocks, build an open > community and put out great releases. > > Bernd > +
Mathias Herberts 2011-09-02, 20:24
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalDoug Meil 2011-09-02, 20:29
Just for conversation, if somebody approached ASF incubator with an idea for a framework called "ExtendedSleep" that claimed to be a Java relational-object mapping framework (instead of the crusty object-relational approach that the Hibernate mapping framework provides), would anybody take it seriously? Bernd, to your point the best defense is a good offense (I.e., write code that rocks), and I agree. On 9/2/11 4:01 PM, "Bernd Fondermann" <[EMAIL PROTECTED]> wrote: >On Friday, September 2, 2011, Gary Helmling <[EMAIL PROTECTED]> wrote: > >> >> Claims a relationship with HBase. Is there overlapping code or is this >just >> the duplication of functionality? There's no community relationship >>that >> I'm aware of. I haven't seen any of the proposed committers on the >>HBase >> user and dev lists to this point, so that doesn't set much of a >>precedent >> for community interaction. >> >> >> Overall I see no meaningful differentiation vs HBase as an existing >project, >> no past attempts to interact with the most relevant Apache community, >>and >> only an, until now, private "community" of government users. I think >>it's >> great that they want to open source this. I don't want to discourage >>that >> -- go for it! But I don't see what the benefit is of ASF incubating >>this. >> I only see the potential for community fragmentation and market >>confusion >> over such closely similar projects. > >Over the years, many "competing" projects went through incubation or were >developped in different projects. There are at least 5 HTTP servers, two >WS-* stacks, three build tools, there is Cassandra, HBase and CouchDB. No >project can claim to dominate a particular technical domain. Maybe a bit >surprising, this evolution of projects has fostered innovation and >contributed to ASFs versatlity. > >The only thing you can really do is write code that rocks, build an open >community and put out great releases. > > Bernd +
Doug Meil 2011-09-02, 20:29
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalBernd Fondermann 2011-09-03, 06:49
On Friday, September 2, 2011, Doug Meil <[EMAIL PROTECTED]>
wrote: > > Just for conversation, if somebody approached ASF incubator with an idea > for a framework called "ExtendedSleep" that claimed to be a Java > relational-object mapping framework (instead of the crusty > object-relational approach that the Hibernate mapping framework provides), > would anybody take it seriously? Yes, since at least 1 person at the Incubator takes everything seriously. These I people they love new code and projects, regardless what it does. Bernd +
Bernd Fondermann 2011-09-03, 06:49
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalEric Charles 2011-09-03, 19:46
On 02/09/11 13:01, Bernd Fondermann wrote:
> > The only thing you can really do is write code that rocks, build an open > community and put out great releases. > +1 However, I will add: 'competition is good, dispersion is no good' It's true that similar projects already exist within Apache. An example is the recent concerns of Hama on the Giraph project for the Pregel functionality. But Hama was still incubating and not having completed the Pregel. Another example is Whirr and DeltaCloud, but they are implemented in different technologies. Here, we already have a Java implementation of BigTable and I wonder what we'll gain with another one? Sounds to me like incubating a Tomcat equivalent, but with e.g. automatic authorization on HTTP request (I invent): not much sense... If finally incubated, my *wishful thinking* is that both projects can come to share 'common components'. This is to me an argument for the incubation that could lead to a common-bigtable sub-project :) But honestly, my thinking is that we are still more loosing focus in already very puzzled NoSQL world. Good for the consultants, bad for the users. I wonder if all this thread should be moved to the incubator mailing list. After all, this is where the decision will occur. Thx. Eric > Bernd > +
Eric Charles 2011-09-03, 19:46
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalStack 2011-09-03, 20:23
On Sat, Sep 3, 2011 at 12:46 PM, Eric Charles
<[EMAIL PROTECTED]> wrote: > I wonder if all this thread should be moved to the incubator mailing list. > After all, this is where the decision will occur. > Agree. Would be good to get Accumulo fellows in on the conversation too. St.Ack +
Stack 2011-09-03, 20:23
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalBill de hÓra 2011-09-03, 23:16
On 02/09/11 19:06, Stack wrote:
> What do folks think? Not putting the log into hdfs seems like a good idea. Bill +
Bill de hÓra 2011-09-03, 23:16
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalStack 2011-09-04, 02:54
On Sat, Sep 3, 2011 at 4:16 PM, Bill de hÓra <[EMAIL PROTECTED]> wrote:
> On 02/09/11 19:06, Stack wrote: >> >> What do folks think? > > Not putting the log into hdfs seems like a good idea. > Why you think that Bill? Yours, St.Ack +
Stack 2011-09-04, 02:54
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalMathias Herberts 2011-09-04, 06:43
On Sep 4, 2011 1:39 AM, "Bill de hÓra" <[EMAIL PROTECTED]> wrote:
> > On 02/09/11 19:06, Stack wrote: >> >> What do folks think? > > > Not putting the log into hdfs seems like a good idea. I was somehow thinking the opposite as it makes irrecoverable machine failures much more problematic. What makes you say it's a good idea? +
Mathias Herberts 2011-09-04, 06:43
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalRyan Rawson 2011-09-04, 06:49
We thought about it earlier, but single machine needing to come back
up to restore didnt seem like a good idea. -ryan On Sat, Sep 3, 2011 at 11:43 PM, Mathias Herberts <[EMAIL PROTECTED]> wrote: > On Sep 4, 2011 1:39 AM, "Bill de hÓra" <[EMAIL PROTECTED]> wrote: >> >> On 02/09/11 19:06, Stack wrote: >>> >>> What do folks think? >> >> >> Not putting the log into hdfs seems like a good idea. > > I was somehow thinking the opposite as it makes irrecoverable machine > failures much more problematic. What makes you say it's a good idea? > +
Ryan Rawson 2011-09-04, 06:49
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalBill 2011-09-05, 21:06
On 04/09/11 07:43, Mathias Herberts wrote:
> On Sep 4, 2011 1:39 AM, "Bill de hÓra"<[EMAIL PROTECTED]> wrote: >> >> On 02/09/11 19:06, Stack wrote: >>> >>> What do folks think? >> >> >> Not putting the log into hdfs seems like a good idea. > > I was somehow thinking the opposite as it makes irrecoverable machine > failures much more problematic. What makes you say it's a good idea? > Allows more control over the write path, specifically sequential I/O and crash recovery. Granted the commit needs to be replicated, but you need that regardless. Thinking a bit more it might not square with the regionserver model anyway, plus the Accumulo proposal mentions a service rather than a local disk. The WAL seems to be hardened up these days anyway making things like https://issues.apache.org/jira/browse/HBASE-4107 more of an edge case.. Bill +
Bill 2011-09-05, 21:06
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalJoey Echeverria 2011-09-05, 21:35
The Accumulo implementation of the WAL is a separate set of daemons.
When you write to the WAL, you send your transactions to three of the logging servers. When you do a recovery, I believe one of the three servers that has the WAL for the down server copies it to HDFS and then a MapReduce job splits the log and re-inserts the recovered data. You should have the same survivability that you get with HDFS. -Joey On Mon, Sep 5, 2011 at 5:06 PM, Bill <[EMAIL PROTECTED]> wrote: > On 04/09/11 07:43, Mathias Herberts wrote: >> >> On Sep 4, 2011 1:39 AM, "Bill de hÓra"<[EMAIL PROTECTED]> wrote: >>> >>> On 02/09/11 19:06, Stack wrote: >>>> >>>> What do folks think? >>> >>> >>> Not putting the log into hdfs seems like a good idea. >> >> I was somehow thinking the opposite as it makes irrecoverable machine >> failures much more problematic. What makes you say it's a good idea? >> > > Allows more control over the write path, specifically sequential I/O and > crash recovery. Granted the commit needs to be replicated, but you need that > regardless. Thinking a bit more it might not square with the regionserver > model anyway, plus the Accumulo proposal mentions a service rather than a > local disk. The WAL seems to be hardened up these days anyway making things > like https://issues.apache.org/jira/browse/HBASE-4107 more of an edge case.. > > Bill > -- Joseph Echeverria Cloudera, Inc. 443.305.9434 +
Joey Echeverria 2011-09-05, 21:35
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalDoug Meil 2011-09-06, 21:51
There was a discussion over the weekend on the incubator dist-list about Accumulo and describing what was borrowed from Hadoop-core and Hbase. http://mail-archives.apache.org/mod_mbox/incubator-general/201109.mbox/%3C8 [EMAIL PROTECTED]%3E 5400 lines: slightly modified versions of Hadoop BCFile and related classes (our current file format extends BCFile) 4300 lines: heavily modified versions of MapFile and SequenceFile (no longer our default file format, but still included for backward compatibility) 2000 lines: heavily modified versions of HBase BlockCache and related files (Adam didn't count the tests when he said 1500 lines) 1300 lines: heavily modified versions of Hadoop BloomFilters 419 lines: modified Hadoop TeraSortIngest to sort data using Accumulo 325 lines: our Value is an immutable version of Hadoop BytesWritable 142 lines: modified ClassLoader based on commons-jci ReloadingClassLoader On 9/5/11 5:35 PM, "Joey Echeverria" <[EMAIL PROTECTED]> wrote: >The Accumulo implementation of the WAL is a separate set of daemons. >When you write to the WAL, you send your transactions to three of the >logging servers. When you do a recovery, I believe one of the three >servers that has the WAL for the down server copies it to HDFS and >then a MapReduce job splits the log and re-inserts the recovered data. >You should have the same survivability that you get with HDFS. > >-Joey > >On Mon, Sep 5, 2011 at 5:06 PM, Bill <[EMAIL PROTECTED]> wrote: >> On 04/09/11 07:43, Mathias Herberts wrote: >>> >>> On Sep 4, 2011 1:39 AM, "Bill de hÓra"<[EMAIL PROTECTED]> wrote: >>>> >>>> On 02/09/11 19:06, Stack wrote: >>>>> >>>>> What do folks think? >>>> >>>> >>>> Not putting the log into hdfs seems like a good idea. >>> >>> I was somehow thinking the opposite as it makes irrecoverable machine >>> failures much more problematic. What makes you say it's a good idea? >>> >> >> Allows more control over the write path, specifically sequential I/O and >> crash recovery. Granted the commit needs to be replicated, but you need >>that >> regardless. Thinking a bit more it might not square with the >>regionserver >> model anyway, plus the Accumulo proposal mentions a service rather than >>a >> local disk. The WAL seems to be hardened up these days anyway making >>things >> like https://issues.apache.org/jira/browse/HBASE-4107 more of an edge >>case.. >> >> Bill >> > > > >-- >Joseph Echeverria >Cloudera, Inc. >443.305.9434 +
Doug Meil 2011-09-06, 21:51
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalBill de hÓra 2011-09-05, 20:54
On 04/09/11 07:43, Mathias Herberts wrote:
> On Sep 4, 2011 1:39 AM, "Bill de hÓra"<[EMAIL PROTECTED]> wrote: >> >> On 02/09/11 19:06, Stack wrote: >>> >>> What do folks think? >> >> >> Not putting the log into hdfs seems like a good idea. > > I was somehow thinking the opposite as it makes irrecoverable machine > failures much more problematic. What makes you say it's a good idea? > Grants control over the write path, specifically sequential I/O and crash recovery. Granted you'd want more than one copy replicated, but you need that regardless. Thinking a bit more it might not work with the regionserver model, and the Accumulo proposal mentions a service here rather than a local disk. Anyway, I may be still thinking about older HBase code when the WAL didn't work so well, today it seems like just edge cases (https://issues.apache.org/jira/browse/HBASE-4107). Bill +
Bill de hÓra 2011-09-05, 20:54
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalSteven Noels 2011-09-05, 08:30
On Fri, Sep 2, 2011 at 8:06 PM, Stack <[EMAIL PROTECTED]> wrote:
See here for the incubator proposal: > http://wiki.apache.org/incubator/AccumuloProposal > I'm usually dumbfounded with the amount of secrecy that typically surrounds these incubator proposals - which is weird as the goal is ultimately to open something up. I've been around ASF long enough to realize organizations often have specific goals as to when and how drop their precious jewels into the Foundation. Oh well. I for one am more disappointed that this apparently has been an on-going work for quite some time, shrouded in secrecy, by people fully aware of the fact that an equivalent - community-driven - alternative existed - however they didn't feel obliged or inclined to go the extra mile and start communicating (or collaborating!) and they now try and look for technical reasons as to whether they really had to do their own thing. It's pretty unconvincing to open something up only once it's done and you can afford to have other cooks in the kitchen. But rather than shifting the burden to our side, maybe the incubator should be rather strict in verifying there *is* a viable community for an Apache-style development model at Accumulo. I've been mentoring BEA code dumps myself into Apache, and once the company focus shifted, the lack of genuine community dev adoption became rapidly apparent. Which was shortly after that project graduated, of course. Steven. -- Steven Noels http://outerthought.org/ Scalable Smart Data Makers of Lily +
Steven Noels 2011-09-05, 08:30
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalBernd Fondermann 2011-09-05, 10:50
On Mon, Sep 5, 2011 at 10:30, Steven Noels <[EMAIL PROTECTED]> wrote:
> On Fri, Sep 2, 2011 at 8:06 PM, Stack <[EMAIL PROTECTED]> wrote: > > See here for the incubator proposal: >> http://wiki.apache.org/incubator/AccumuloProposal >> > > I'm usually dumbfounded with the amount of secrecy that typically surrounds > these incubator proposals - which is weird as the goal is ultimately to open > something up. I've been around ASF long enough to > realize organizations often have specific goals as to when and how drop > their precious jewels into the Foundation. Oh well. > > I for one am more disappointed that this apparently has been an on-going > work for quite some time, shrouded in secrecy, by people fully aware of the > fact that an equivalent - community-driven - alternative existed - however > they didn't feel obliged or inclined to go the extra mile and start > communicating (or collaborating!) and they now try and look for technical > reasons as to whether they really had to do their own thing. It's pretty > unconvincing to open something up only once it's done and you can afford to > have other cooks in the kitchen. You are putting this as if there is a hidden agenda of some kind. This is purely speculation. Otherwise, please come up with facts. I read the proposal and surrounding discussion this way: Because of legal issues this NSA-internal project was unable to contribute back (whether or not that's actually true, I cannot say). They were trying to open source it, to be able to interact with the relevant Apache projects (I learned on this thread these are Hadoop and HBase) and had to overcome the legal issues (pls. see the cited LEGAL-JIRA item) which took some time. (NB: These issues are not yet finally solved. The proposal asks for derivation from our standard ICLA.) Now they are open sourcing it. Isn't this what you ask for? I think it would be cool if the HBase community which "is responsible for the creation and maintenance of software related to a distributed database" (board resolution) shows its openess and gets involved with the project as mentors, committers, lurkers. > But rather than shifting the burden to our side, maybe the incubator should > be rather strict in verifying there *is* a viable community for an > Apache-style development model at Accumulo. Sure, that's the single most important goal for any podling. > I've been mentoring BEA code > dumps myself into Apache, and once the company focus shifted, the lack of > genuine community dev adoption became rapidly apparent. Which was shortly > after that project graduated, of course. A project may lose community any time in or after graduation. We seen projects not even making it into the Incubator after being voted in. I cannot foresee the future. Can you? I certainly am not in the speculating-about-what-happens-after-successful-graduation business. If you look at the 100-or-so projects at the ASF, there are all shades of grey, from healthy to brain-dead. Bernd +
Bernd Fondermann 2011-09-05, 10:50
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalAndrew Purtell 2011-09-05, 14:10
> I think it would be cool if the HBase
> community which "is responsible for > the creation and maintenance of > software related to a distributed > database" (board resolution) shows its > openess and gets involved with the > project as mentors, committers, lurkers. How does that benefit HBase? I can see how it would benefit Accumulo. There is an argument to be made it is not a zero sum game, but to do what you suggest will pull resources from moving HBase forward. This seems an "opportunity" to split our precious development resources, split community, split resources, confuse would be adopters, and slow down or stop forward momentum. I'm not seeing support for that on this thread. Perhaps if HBase did what you suggest it is actually contrary to the interests of the project, so would be mismanagement? I am just thinking out loud here. Best regards, - Andy On Mon Sep 5th, 2011 3:50 AM PDT Bernd Fondermann wrote: >On Mon, Sep 5, 2011 at 10:30, Steven Noels <[EMAIL PROTECTED]> wrote: >> On Fri, Sep 2, 2011 at 8:06 PM, Stack <[EMAIL PROTECTED]> wrote: >> >> See here for the incubator proposal: >>> http://wiki.apache.org/incubator/AccumuloProposal >>> >> >> I'm usually dumbfounded with the amount of secrecy that typically surrounds >> these incubator proposals - which is weird as the goal is ultimately to open >> something up. I've been around ASF long enough to >> realize organizations often have specific goals as to when and how drop >> their precious jewels into the Foundation. Oh well. >> >> I for one am more disappointed that this apparently has been an on-going >> work for quite some time, shrouded in secrecy, by people fully aware of the >> fact that an equivalent - community-driven - alternative existed - however >> they didn't feel obliged or inclined to go the extra mile and start >> communicating (or collaborating!) and they now try and look for technical >> reasons as to whether they really had to do their own thing. It's pretty >> unconvincing to open something up only once it's done and you can afford to >> have other cooks in the kitchen. > >You are putting this as if there is a hidden agenda of some kind. >This is purely speculation. Otherwise, please come up with facts. > >I read the proposal and surrounding discussion this way: Because of >legal issues this NSA-internal project was unable to contribute back >(whether or not that's actually true, I cannot say). >They were trying to open source it, to be able to interact with the >relevant Apache projects (I learned on this thread these are Hadoop >and HBase) and had to overcome the legal issues (pls. see the cited >LEGAL-JIRA item) which took some time. (NB: These issues are not yet >finally solved. The proposal asks for derivation from our standard >ICLA.) >Now they are open sourcing it. >Isn't this what you ask for? > >I think it would be cool if the HBase community which "is responsible >for the creation and maintenance of software related to a distributed >database" (board resolution) shows its openess and gets involved with >the project as mentors, committers, lurkers. > >> But rather than shifting the burden to our side, maybe the incubator should >> be rather strict in verifying there *is* a viable community for an >> Apache-style development model at Accumulo. > >Sure, that's the single most important goal for any podling. > >> I've been mentoring BEA code >> dumps myself into Apache, and once the company focus shifted, the lack of >> genuine community dev adoption became rapidly apparent. Which was shortly >> after that project graduated, of course. > >A project may lose community any time in or after graduation. We seen >projects not even making it into the Incubator after being voted in. >I cannot foresee the future. Can you? >I certainly am not in the >speculating-about-what-happens-after-successful-graduation business. >If you look at the 100-or-so projects at the ASF, there are all shades >of grey, from healthy to brain-dead. > > Bernd +
Andrew Purtell 2011-09-05, 14:10
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalAndrew Purtell 2011-09-05, 14:31
Note I am talking about people who feel they have an interest in HBase development, or have some stake, or are attempting to be a competent PMC, or similar. When confronted with an incubation of something that seems very similar ("tomcat" versus "timdog" as has been cleverly said), legitimate concerns are raised here that there will be a negative impact on the momentum of the HBase project, the only question is the magnitude.
On the other hand perhaps members from the community at large will indeed shift to some involvement with Accumulo. Who knows. The Accumulo project would benefit from that. HBase probably will not. Do we all benefit in the end nonetheless? If there were more apparent differences it would seem less like a zero sum game. All of this is of course merely only one person's opinions, not the opinion of my employer, or the HBase PMC, etc. For what it might be worth. Best regards, - Andy On Mon Sep 5th, 2011 7:10 AM PDT Andrew Purtell wrote: >> I think it would be cool if the HBase >> community which "is responsible for >> the creation and maintenance of >> software related to a distributed >> database" (board resolution) shows its >> openess and gets involved with the >> project as mentors, committers, lurkers. > >How does that benefit HBase? I can see how it would benefit Accumulo. There is an argument to be made it is not a zero sum game, but to do what you suggest will pull resources from moving HBase forward. This seems an "opportunity" to split our precious development resources, split community, split resources, confuse would be adopters, and slow down or stop forward momentum. I'm not seeing support for that on this thread. Perhaps if HBase did what you suggest it is actually contrary to the interests of the project, so would be mismanagement? I am just thinking out loud here. > >Best regards, > > - Andy > >On Mon Sep 5th, 2011 3:50 AM PDT Bernd Fondermann wrote: > >>On Mon, Sep 5, 2011 at 10:30, Steven Noels <[EMAIL PROTECTED]> wrote: >>> On Fri, Sep 2, 2011 at 8:06 PM, Stack <[EMAIL PROTECTED]> wrote: >>> >>> See here for the incubator proposal: >>>> http://wiki.apache.org/incubator/AccumuloProposal >>>> >>> >>> I'm usually dumbfounded with the amount of secrecy that typically surrounds >>> these incubator proposals - which is weird as the goal is ultimately to open >>> something up. I've been around ASF long enough to >>> realize organizations often have specific goals as to when and how drop >>> their precious jewels into the Foundation. Oh well. >>> >>> I for one am more disappointed that this apparently has been an on-going >>> work for quite some time, shrouded in secrecy, by people fully aware of the >>> fact that an equivalent - community-driven - alternative existed - however >>> they didn't feel obliged or inclined to go the extra mile and start >>> communicating (or collaborating!) and they now try and look for technical >>> reasons as to whether they really had to do their own thing. It's pretty >>> unconvincing to open something up only once it's done and you can afford to >>> have other cooks in the kitchen. >> >>You are putting this as if there is a hidden agenda of some kind. >>This is purely speculation. Otherwise, please come up with facts. >> >>I read the proposal and surrounding discussion this way: Because of >>legal issues this NSA-internal project was unable to contribute back >>(whether or not that's actually true, I cannot say). >>They were trying to open source it, to be able to interact with the >>relevant Apache projects (I learned on this thread these are Hadoop >>and HBase) and had to overcome the legal issues (pls. see the cited >>LEGAL-JIRA item) which took some time. (NB: These issues are not yet >>finally solved. The proposal asks for derivation from our standard >>ICLA.) >>Now they are open sourcing it. >>Isn't this what you ask for? >> >>I think it would be cool if the HBase community which "is responsible >>for the creation and maintenance of software related to a distributed +
Andrew Purtell 2011-09-05, 14:31
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalBernd Fondermann 2011-09-05, 19:32
On Mon, Sep 5, 2011 at 16:10, Andrew Purtell <[EMAIL PROTECTED]> wrote:
>> I think it would be cool if the HBase >> community which "is responsible for >> the creation and maintenance of >> software related to a distributed >> database" (board resolution) shows its >> openess and gets involved with the >> project as mentors, committers, lurkers. > > How does that benefit HBase? I can see how it would benefit Accumulo. There is an argument to be made it is not a zero sum game, but to do what you suggest will pull resources from moving HBase forward. This seems an "opportunity" to split our precious development resources, split community, split resources, confuse would be adopters, and slow down or stop forward momentum. I'm not seeing support for that on this thread. You are painting a glooming picture here. Yet, it is only a projection of what might could happen. "Split" - users are not left or right. There is no blue or red pill to be taken. If people want to be part of more than one community, they can, and they do. The same way they use mySQL and HBase at the same time. If people like to chose Project A over B, well, that's ok. HBase is not judged by the number of downloads or pure size of community. > Perhaps if HBase did what you suggest it is actually contrary to the interests of the project, so would be mismanagement? I am just thinking out loud here. No. The HBase PMC has a defined mission, set up by the ASF board. Since this mission might have significant overlap with Accumulo, it might (just one possible outcome of Incubation) be tasked with the Accumulo code base as well. The mission of HBase includes fostering a community around this mission. Read: Be open to new HBase users and contributors. In my opinion this includes getting involved with Incubating projects which could bring new ideas, code and "precious resources" to HBase. Sounds like Accumulo could be such a project, but I don't know. As a HBase user and out of curiosity for great tech, I'm certainly excited about Accumulo. (And adding column families on the fly doesn't sound bad, either.) Bernd +
Bernd Fondermann 2011-09-05, 19:32
-
Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposalAndrew Purtell 2011-09-07, 01:44
I also agreed at the time to hold off refactoring the build for Maven modules and supporting RPC engine variants. I would still have the same opinion if not for recent events.
How much work remains for 0.92? If more than a few week's worth, then a parallel refactor of the build could happen, with a final merge step. Best regards, - Andy On Tue Sep 6th, 2011 12:02 PM PDT Gary Helmling wrote: >> Seems like committing it will disrupt the build and src tree layout. >> Gary was holding off till we branched but 0.92 branching is taking too >> long. >> >> + Lets branch this friday, or next? >> + And or, run a vote on whether we should commit security now before >> we branch or after >> >> > >This is getting off topic for the current thread, so I'll open a new thread >to take a vote on converting trunk back in to maven modules. This is what >would be necessary to integrate the various security bits. > >The last discussion we had on this was on the dev list at the end of >May/beginning of June: >http://search-hadoop.com/m/iXZmd2aZwBE1 > >I agreed as much as anyone that we should hold off until after branching >0.92 in order to avoid the disruption of moving the entire source tree >around. So I have been holding off on this on my own discretion and any >delay sits mostly with me. > >Of course, that was three months ago and we still haven't branched. In >hindsight, if we were aware how long the 0.92 process would go on, I think >the thread might have reached a different conclusion. In any case, I think >it warrants another discussion. > >--gh +
Andrew Purtell 2011-09-07, 01:44
|