|
Doug Meil
2011-06-08, 13:53
Allen Wittenauer
2011-06-08, 16:40
Suresh Srinivas
2011-06-08, 17:41
Dhruba Borthakur
2011-06-08, 17:47
Suresh Srinivas
2011-06-08, 17:51
Eli Collins
2011-06-08, 17:53
Steve Loughran
2011-06-09, 11:42
Suresh Srinivas
2011-06-09, 17:33
Stack
2011-06-09, 19:23
Milind Bhandarkar
2011-06-10, 02:21
Steve Loughran
2011-06-10, 09:00
|
-
RE: LimitedPrivate and HBase (thoughts from an observer)Doug Meil 2011-06-08, 13:53
Hi there-
The following are some thoughts on questions raised in this thread that are more on the Hadoop-core development process than this particular issue. Disclosure: I'm active on the HBase dist-list, so Hadoop-core folks can take my comments with a pinch or two of salt if required. Re: "What is the real criteria for changing an API from private to limited?" I don't know, but from the perspective of an observer my request to Hadoop-core developers is to not to over-think this. Re: "How "closely related" does a project need to be to get this privilege?" / " What is the criteria by which an API gets opened to something outside of the Hadoop umbrella" Given the context of the original question, is this debate really necessary? Everybody knows that although HBase is a TLP now it grew out of Hadoop (e.g, there's a chapter about HBase in the Hadoop book, etc.) It's not like somebody from Hypertable was strong-arming for feature requests. Re: "If it was almost anyone else, it would have sat there.... and *that's* the point where I'm mainly concerned." Hadoop-core development has been slow/stalled over the past 2 years, but recent events such as Yahoo now backing the Apache distro are great signs that velocity will pick up and push forward. Forward progress, even with items as small as this, is good. Re: "Then we can go back working on core Hadoop." Hadoop-core is critical to many frameworks in the Hadoop family (Hive, Pig, and yes, HBase), but software frameworks are only good when utilized and serving the needs of those who use them. My request to Hadoop-core developers is to not assume that Core exists as an end to itself. Thanks, and keep up the good work! -----Original Message----- From: Allen Wittenauer [mailto:[EMAIL PROTECTED]] Sent: Monday, June 06, 2011 9:33 PM To: [EMAIL PROTECTED] Subject: Re: LimitedPrivate and HBase On Jun 6, 2011, at 6:23 PM, Todd Lipcon wrote: > > Nah, I just think these "meta discussions" waste an awful lot of time > that's better spent making real progress on the code, or reviewing the > complex changes where extra eyes really make a big difference. OK. That's make it easier to just -1 changes like this with reasoning such as "HBase is not a related project." Then we can go back working on core Hadoop.
-
Re: LimitedPrivate and HBase (thoughts from an observer)Allen Wittenauer 2011-06-08, 16:40
On Jun 8, 2011, at 6:53 AM, Doug Meil wrote: > > Re: "How "closely related" does a project need to be to get this privilege?" / " What is the criteria by which an API gets opened to something outside of the Hadoop umbrella" > > Given the context of the original question, is this debate really necessary? Everybody knows that although HBase is a TLP now it grew out of Hadoop (e.g, there's a chapter about HBase in the Hadoop book, etc.) It's not like somebody from Hypertable was strong-arming for feature requests. If HBase needs an API, why wouldn't something else? Why should something be marked LimitedPrivate to HBase instead of just making it Public and being done with it?
-
Re: LimitedPrivate and HBase (thoughts from an observer)Suresh Srinivas 2011-06-08, 17:41
I do not see any issue with the change that Todd has made. We have done
similar changes in HDFS-1586 in the past. Making APIs public comes with a cost. That is what we are avoiding with LimitedPrivate. The intention was to include the following projects that are closely tied to Hadoop as projects eligible for LimitedPrivate. {"HBase", "HDFS", "Hive", "MapReduce", "Pig"}. This list could grow in the future. When such projects break because of API change, we can co-ordinate as community and fix the issues. This is not true for some application that we do not know of breaks! If others, outside the umbrella of these projects need an API, they could open a jira and we could address it. On 6/8/11 9:40 AM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote: > > On Jun 8, 2011, at 6:53 AM, Doug Meil wrote: >> >> Re: "How "closely related" does a project need to be to get this privilege?" >> / " What is the criteria by which an API gets opened to something outside of >> the Hadoop umbrella" >> >> Given the context of the original question, is this debate really necessary? >> Everybody knows that although HBase is a TLP now it grew out of Hadoop (e.g, >> there's a chapter about HBase in the Hadoop book, etc.) It's not like >> somebody from Hypertable was strong-arming for feature requests. > > If HBase needs an API, why wouldn't something else? Why should something be > marked LimitedPrivate to HBase instead of just making it Public and being done > with it?
-
Re: LimitedPrivate and HBase (thoughts from an observer)Dhruba Borthakur 2011-06-08, 17:47
I too think that LimitedPrivate is a good idea for projects that work
closely with the Hadoop ecosystem (Hive, HBase, MR, etc) It allows us to experiment with an API, that if proved useful in the longer run, can graduate to be a public API in future. Some people may rightly claim that this gives unfair advantage to projects in the Hadoop ecosystem vs projects that are outside of this system, but I see no harm in that. One reason for this is that there are many developers who work on multiple of these projects, and it is easier to coordinate changes among these projects. thanks, -dhruba On Wed, Jun 8, 2011 at 10:41 AM, Suresh Srinivas <[EMAIL PROTECTED]>wrote: > I do not see any issue with the change that Todd has made. We have done > similar changes in HDFS-1586 in the past. > > Making APIs public comes with a cost. That is what we are avoiding with > LimitedPrivate. The intention was to include the following projects that > are > closely tied to Hadoop as projects eligible for LimitedPrivate. > {"HBase", "HDFS", "Hive", "MapReduce", "Pig"}. This list could grow in the > future. > > When such projects break because of API change, we can co-ordinate as > community and fix the issues. This is not true for some application that we > do not know of breaks! > > If others, outside the umbrella of these projects need an API, they could > open a jira and we could address it. > > > On 6/8/11 9:40 AM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote: > > > > > On Jun 8, 2011, at 6:53 AM, Doug Meil wrote: > >> > >> Re: "How "closely related" does a project need to be to get this > privilege?" > >> / " What is the criteria by which an API gets opened to something > outside of > >> the Hadoop umbrella" > >> > >> Given the context of the original question, is this debate really > necessary? > >> Everybody knows that although HBase is a TLP now it grew out of Hadoop > (e.g, > >> there's a chapter about HBase in the Hadoop book, etc.) It's not like > >> somebody from Hypertable was strong-arming for feature requests. > > > > If HBase needs an API, why wouldn't something else? Why should something > be > > marked LimitedPrivate to HBase instead of just making it Public and being > done > > with it? > > -- Connect to me at http://www.facebook.com/dhruba
-
Re: LimitedPrivate and HBase (thoughts from an observer)Suresh Srinivas 2011-06-08, 17:51
BTW, thank you Todd and Stack for all your effort in making changes to HDFS
to make it work well with HBase. These changes are very important in making HDFS better! On 6/8/11 10:47 AM, "Dhruba Borthakur" <[EMAIL PROTECTED]> wrote: > I too think that LimitedPrivate is a good idea for projects that work > closely with the Hadoop ecosystem (Hive, HBase, MR, etc) It allows us to > experiment with an API, that if proved useful in the longer run, can > graduate to be a public API in future. > > Some people may rightly claim that this gives unfair advantage to projects > in the Hadoop ecosystem vs projects that are outside of this system, but I > see no harm in that. One reason for this is that there are many developers > who work on multiple of these projects, and it is easier to coordinate > changes among these projects. > > thanks, > -dhruba > > On Wed, Jun 8, 2011 at 10:41 AM, Suresh Srinivas > <[EMAIL PROTECTED]>wrote: > >> I do not see any issue with the change that Todd has made. We have done >> similar changes in HDFS-1586 in the past. >> >> Making APIs public comes with a cost. That is what we are avoiding with >> LimitedPrivate. The intention was to include the following projects that >> are >> closely tied to Hadoop as projects eligible for LimitedPrivate. >> {"HBase", "HDFS", "Hive", "MapReduce", "Pig"}. This list could grow in the >> future. >> >> When such projects break because of API change, we can co-ordinate as >> community and fix the issues. This is not true for some application that we >> do not know of breaks! >> >> If others, outside the umbrella of these projects need an API, they could >> open a jira and we could address it. >> >> >> On 6/8/11 9:40 AM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote: >> >>> >>> On Jun 8, 2011, at 6:53 AM, Doug Meil wrote: >>>> >>>> Re: "How "closely related" does a project need to be to get this >> privilege?" >>>> / " What is the criteria by which an API gets opened to something >> outside of >>>> the Hadoop umbrella" >>>> >>>> Given the context of the original question, is this debate really >> necessary? >>>> Everybody knows that although HBase is a TLP now it grew out of Hadoop >> (e.g, >>>> there's a chapter about HBase in the Hadoop book, etc.) It's not like >>>> somebody from Hypertable was strong-arming for feature requests. >>> >>> If HBase needs an API, why wouldn't something else? Why should something >> be >>> marked LimitedPrivate to HBase instead of just making it Public and being >> done >>> with it? >> >> >
-
Re: LimitedPrivate and HBase (thoughts from an observer)Eli Collins 2011-06-08, 17:53
Agree. HDFS is not a general purpose file system. Its API was, and
continues to be, co-designed with it's directly adjacent projects (MR, HBase, Hive, Pig etc) in mind. IMO it's one of it's biggest strengths. This is of course does not mean we get to ignore existing users, compatibility concerns, etc. On Wed, Jun 8, 2011 at 10:47 AM, Dhruba Borthakur <[EMAIL PROTECTED]> wrote: > I too think that LimitedPrivate is a good idea for projects that work > closely with the Hadoop ecosystem (Hive, HBase, MR, etc) It allows us to > experiment with an API, that if proved useful in the longer run, can > graduate to be a public API in future. > > Some people may rightly claim that this gives unfair advantage to projects > in the Hadoop ecosystem vs projects that are outside of this system, but I > see no harm in that. One reason for this is that there are many developers > who work on multiple of these projects, and it is easier to coordinate > changes among these projects. > > thanks, > -dhruba > > On Wed, Jun 8, 2011 at 10:41 AM, Suresh Srinivas <[EMAIL PROTECTED]>wrote: > >> I do not see any issue with the change that Todd has made. We have done >> similar changes in HDFS-1586 in the past. >> >> Making APIs public comes with a cost. That is what we are avoiding with >> LimitedPrivate. The intention was to include the following projects that >> are >> closely tied to Hadoop as projects eligible for LimitedPrivate. >> {"HBase", "HDFS", "Hive", "MapReduce", "Pig"}. This list could grow in the >> future. >> >> When such projects break because of API change, we can co-ordinate as >> community and fix the issues. This is not true for some application that we >> do not know of breaks! >> >> If others, outside the umbrella of these projects need an API, they could >> open a jira and we could address it. >> >> >> On 6/8/11 9:40 AM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote: >> >> > >> > On Jun 8, 2011, at 6:53 AM, Doug Meil wrote: >> >> >> >> Re: "How "closely related" does a project need to be to get this >> privilege?" >> >> / " What is the criteria by which an API gets opened to something >> outside of >> >> the Hadoop umbrella" >> >> >> >> Given the context of the original question, is this debate really >> necessary? >> >> Everybody knows that although HBase is a TLP now it grew out of Hadoop >> (e.g, >> >> there's a chapter about HBase in the Hadoop book, etc.) It's not like >> >> somebody from Hypertable was strong-arming for feature requests. >> > >> > If HBase needs an API, why wouldn't something else? Why should something >> be >> > marked LimitedPrivate to HBase instead of just making it Public and being >> done >> > with it? >> >> > > > -- > Connect to me at http://www.facebook.com/dhruba >
-
Re: LimitedPrivate and HBase (thoughts from the build and test world)Steve Loughran 2011-06-09, 11:42
On 06/08/2011 06:41 PM, Suresh Srinivas wrote:
> I do not see any issue with the change that Todd has made. We have done > similar changes in HDFS-1586 in the past. > > Making APIs public comes with a cost. That is what we are avoiding with > LimitedPrivate. The intention was to include the following projects that are > closely tied to Hadoop as projects eligible for LimitedPrivate. > {"HBase", "HDFS", "Hive", "MapReduce", "Pig"}. This list could grow in the > future. I'm going to talk about my experience on the Ant team. One of the lessons of that project is that in the open source world, you can't predict how your code gets used, or control it. If someone wants to take your app and use it as a library -they can. If someone wants to do something completely unexpected with that library -they can. And this is a good thing, because your code gets used. Yes, you get new bugreps, but every person using your code is someone not using somebody elses code. You win. The other lesson from that is the following: in open source, there is no such thing as private code. * If you mark something as package scoped, they just inject their classes into your package (and who hasn't done that with their Hadoop extensions?). * If you mark something as protected, they subclass and open up its privacy. * If you mark something as private, they edit your source and create a new JAR with the relaxed permission for any of these actions, you end up fielding the bugreps, as the stack trace points to you. And it increases maintenance costs for everyone. Alternatively they cut and paste your code into their codebase, possibly -but not always- retaining the apache credits. That * complicates copyright and lawsuits: http://www.theserverside.com/news/thread.tss?thread_id=29958 * increases maintenance costs for everyone, especially if there are security issues with the original code. > When such projects break because of API change, we can co-ordinate as > community and fix the issues. This is not true for some application that we > do not know of breaks! The way Ant handled this with Gump, the nightly clean build of all the OSS Java projects built with Ant http://vmgump.apache.org/gump/public/ For all the projects, they thought they were getting a free CI build run, but what it really was was a regression test of Ant and every single OSS project. If a change in Ant broke anyone's build: we noticed. If a change in Log4J broke a build, someone noticed. It became a rapid-response regression test for the entire OSS suite. Sadly, it doesn't work so well. I'd blame Maven, but the move to ivy dependencies doesn't help either, it complicates classpaths no end. Even so, the idea is great: build and test your downstream applications, and the things you depend on, so you find problems within 24 hours of the change being committed -regardless of which project committed the change. The way to do it now would be with Jenkins, not just building and testing Hadooop-{core, hdfs, mapreduce}, but -building and publishing every upstream dependency. -test against the trunk versions build locally. -build and test against the ivy-versioned artifacts that are controlled by the version.properties Together this flags up when something works against the old artifacts, but doesn't work against the trunk versions: that's their regressions, caught early. Downstream -build and test the OSS projects that work with Hadoop. That's the apache ones: HBase, Mahout, Pig, Hive, Hama etc, and the other ones, such as Cascading. That can be offered as a service to these projects "we will build and test your code against our trunk", a service designed to benefit everyone. They find their bugs, we find regressions. This is a pretty complex project, especially when you think about the challenge of testing your RPM generation code will install the RPMs (I bring up clean CentOS VMs for such a purpose), but without it you don't get everything working together, which is the state things appear to be in today. Ignoring the RPM install & test problems, if people are interested in working on this, we should be able to do a lot of it on Jenkins. Who is willing to get involved? -Steve
-
Re: LimitedPrivate and HBase (thoughts from the build and test world)Suresh Srinivas 2011-06-09, 17:33
> The other lesson from that is the following: in open source, there is no > such thing as private code. The goal of InterfaceAudiencme and InterfaceStability is not to prevent some one from using the code. It merely suggests who the interface is intended for and its stability. An interface marked Public and Stable guarantees backward compatibility. These are intended for every one to use. Changes to these interfaces must be done extra carefully to ensure this. One can still use LimitedPrivate/Private or Unstable/Evolving interfaces outside. But these interfaces can change freely, in non backward compatible way. The interface might even be deleted in future releases. Any one using it, do it at their own risk of seeing their code break and having to change their code as the interface evolves. Regards, Suresh
-
Re: LimitedPrivate and HBase (thoughts from the build and test world)Stack 2011-06-09, 19:23
Nice reality check and thanks for the how it was addressed elsewhere Steve.
As you say, it sounds like a large undertaking but it would be a sweet service for the downstreamers. St.Ack On Thu, Jun 9, 2011 at 4:42 AM, Steve Loughran <[EMAIL PROTECTED]> wrote: > On 06/08/2011 06:41 PM, Suresh Srinivas wrote: >> >> I do not see any issue with the change that Todd has made. We have done >> similar changes in HDFS-1586 in the past. >> >> Making APIs public comes with a cost. That is what we are avoiding with >> LimitedPrivate. The intention was to include the following projects that >> are >> closely tied to Hadoop as projects eligible for LimitedPrivate. >> {"HBase", "HDFS", "Hive", "MapReduce", "Pig"}. This list could grow in the >> future. > > I'm going to talk about my experience on the Ant team. > > One of the lessons of that project is that in the open source world, you > can't predict how your code gets used, or control it. If someone wants to > take your app and use it as a library -they can. If someone wants to do > something completely unexpected with that library -they can. And this is a > good thing, because your code gets used. Yes, you get new bugreps, but every > person using your code is someone not using somebody elses code. You win. > > The other lesson from that is the following: in open source, there is no > such thing as private code. > > * If you mark something as package scoped, they just inject their classes > into your package (and who hasn't done that with their Hadoop extensions?). > * If you mark something as protected, they subclass and open up its privacy. > * If you mark something as private, they edit your source and create a new > JAR with the relaxed permission > > for any of these actions, you end up fielding the bugreps, as the stack > trace points to you. And it increases maintenance costs for everyone. > > > Alternatively they cut and paste your code into their codebase, possibly > -but not always- retaining the apache credits. > > That > * complicates copyright and lawsuits: > http://www.theserverside.com/news/thread.tss?thread_id=29958 > > * increases maintenance costs for everyone, especially if there are > security issues with the original code. > >> When such projects break because of API change, we can co-ordinate as >> community and fix the issues. This is not true for some application that >> we >> do not know of breaks! > > The way Ant handled this with Gump, the nightly clean build of all the OSS > Java projects built with Ant > http://vmgump.apache.org/gump/public/ > > For all the projects, they thought they were getting a free CI build run, > but what it really was was a regression test of Ant and every single OSS > project. If a change in Ant broke anyone's build: we noticed. If a change in > Log4J broke a build, someone noticed. It became a rapid-response regression > test for the entire OSS suite. > > Sadly, it doesn't work so well. I'd blame Maven, but the move to ivy > dependencies doesn't help either, it complicates classpaths no end. > > Even so, the idea is great: build and test your downstream applications, and > the things you depend on, so you find problems within 24 hours of the change > being committed -regardless of which project committed the change. > > The way to do it now would be with Jenkins, not just building and testing > Hadooop-{core, hdfs, mapreduce}, but > -building and publishing every upstream dependency. > -test against the trunk versions build locally. > -build and test against the ivy-versioned artifacts that are controlled by > the version.properties > > Together this flags up when something works against the old artifacts, but > doesn't work against the trunk versions: that's their regressions, caught > early. > > Downstream > -build and test the OSS projects that work with Hadoop. > That's the apache ones: HBase, Mahout, Pig, Hive, Hama etc, and the other > ones, such as Cascading. > > That can be offered as a service to these projects "we will build and test
-
Re: LimitedPrivate and HBase (thoughts from the build and test world)Milind Bhandarkar 2011-06-10, 02:21
[Just wondering if one of the criteria for graduating to a top-level
project should be "no dependency on the LimitedPrivate APIs of the parent project".] Steve, I agree with your suggestion for a downstream-project-build-and-test instance. All I can say is, "stay tuned". - milind -- Milind Bhandarkar [EMAIL PROTECTED] +1-650-776-3167 On 6/9/11 4:42 AM, "Steve Loughran" <[EMAIL PROTECTED]> wrote: >On 06/08/2011 06:41 PM, Suresh Srinivas wrote: >> I do not see any issue with the change that Todd has made. We have done >> similar changes in HDFS-1586 in the past. >> >> Making APIs public comes with a cost. That is what we are avoiding with >> LimitedPrivate. The intention was to include the following projects >>that are >> closely tied to Hadoop as projects eligible for LimitedPrivate. >> {"HBase", "HDFS", "Hive", "MapReduce", "Pig"}. This list could grow in >>the >> future. > >I'm going to talk about my experience on the Ant team. > >One of the lessons of that project is that in the open source world, you >can't predict how your code gets used, or control it. If someone wants >to take your app and use it as a library -they can. If someone wants to >do something completely unexpected with that library -they can. And this >is a good thing, because your code gets used. Yes, you get new bugreps, >but every person using your code is someone not using somebody elses >code. You win. > >The other lesson from that is the following: in open source, there is no >such thing as private code. > >* If you mark something as package scoped, they just inject their >classes into your package (and who hasn't done that with their Hadoop >extensions?). >* If you mark something as protected, they subclass and open up its >privacy. >* If you mark something as private, they edit your source and create a >new JAR with the relaxed permission > >for any of these actions, you end up fielding the bugreps, as the stack >trace points to you. And it increases maintenance costs for everyone. > > >Alternatively they cut and paste your code into their codebase, possibly >-but not always- retaining the apache credits. > >That > * complicates copyright and lawsuits: > http://www.theserverside.com/news/thread.tss?thread_id=29958 > > * increases maintenance costs for everyone, especially if there are >security issues with the original code. > >> When such projects break because of API change, we can co-ordinate as >> community and fix the issues. This is not true for some application >>that we >> do not know of breaks! > >The way Ant handled this with Gump, the nightly clean build of all the >OSS Java projects built with Ant >http://vmgump.apache.org/gump/public/ > >For all the projects, they thought they were getting a free CI build >run, but what it really was was a regression test of Ant and every >single OSS project. If a change in Ant broke anyone's build: we noticed. >If a change in Log4J broke a build, someone noticed. It became a >rapid-response regression test for the entire OSS suite. > >Sadly, it doesn't work so well. I'd blame Maven, but the move to ivy >dependencies doesn't help either, it complicates classpaths no end. > >Even so, the idea is great: build and test your downstream applications, >and the things you depend on, so you find problems within 24 hours of >the change being committed -regardless of which project committed the >change. > >The way to do it now would be with Jenkins, not just building and >testing Hadooop-{core, hdfs, mapreduce}, but > -building and publishing every upstream dependency. > -test against the trunk versions build locally. > -build and test against the ivy-versioned artifacts that are >controlled by the version.properties > >Together this flags up when something works against the old artifacts, >but doesn't work against the trunk versions: that's their regressions, >caught early. > >Downstream > -build and test the OSS projects that work with Hadoop. > That's the apache ones: HBase, Mahout, Pig, Hive, Hama etc, and the
-
Re: LimitedPrivate and HBase (thoughts from the build and test world)Steve Loughran 2011-06-10, 09:00
On 06/09/2011 08:23 PM, Stack wrote:
> Nice reality check and thanks for the how it was addressed elsewhere Steve. > > As you say, it sounds like a large undertaking but it would be a sweet > service for the downstreamers. as an aside, the thing that usually prevents you using Java apps as libraries is random calls to System.exit(). Hadoop does that; when I brought up nodes in-VM I'd catch those calls in a security manager, convert to RuntimeExceptions and throw them up the stack. IMO it'd be better for the whole hadoop stack to do this rather than have random threads take down the VMs, which is a debugging nightmare |