|
Owen O'Malley
2011-06-14, 22:56
Allen Wittenauer
2011-06-14, 23:35
Steve Loughran
2011-06-15, 09:58
Ian Holsman
2011-06-14, 23:38
Konstantin Boudnik
2011-06-15, 00:16
Eli Collins
2011-06-15, 00:48
Allen Wittenauer
2011-06-15, 01:15
Eli Collins
2011-06-15, 01:45
Allen Wittenauer
2011-06-15, 02:46
Konstantin Boudnik
2011-06-15, 02:51
Steve Loughran
2011-06-15, 09:52
Konstantin Boudnik
2011-06-15, 15:58
Steve Loughran
2011-06-17, 11:01
Konstantin Boudnik
2011-06-17, 18:17
Steve Loughran
2011-06-20, 12:43
Konstantin Boudnik
2011-06-17, 18:12
Eli Collins
2011-06-15, 16:23
Steve Loughran
2011-06-15, 16:44
Eli Collins
2011-06-15, 16:57
Rottinghuis, Joep
2011-06-16, 04:24
Owen O'Malley
2011-06-16, 14:48
Eli Collins
2011-06-16, 15:31
Steve Loughran
2011-06-15, 09:49
Owen O'Malley
2011-06-15, 02:45
Eli Collins
2011-06-15, 16:40
Matthew Foley
2011-06-15, 17:44
Matthew Foley
2011-06-15, 18:00
Eli Collins
2011-06-16, 01:02
Matthew Foley
2011-06-16, 01:17
Craig L Russell
2011-06-16, 02:19
Ian Holsman
2011-06-16, 02:52
Todd Lipcon
2011-06-16, 04:30
Ian Holsman
2011-06-16, 04:47
Eric Sammer
2011-06-16, 06:35
Steve Loughran
2011-06-16, 11:46
Owen O'Malley
2011-06-16, 15:02
Eli Collins
2011-06-16, 15:41
Matthew Foley
2011-06-16, 17:17
Eli Collins
2011-06-16, 16:05
Matthew Foley
2011-06-16, 17:38
Eli Collins
2011-06-16, 18:11
Eric Baldeschwieler
2011-06-17, 00:35
Lawrence Rosen
2011-06-16, 17:27
Ted Dunning
2011-06-15, 18:13
Arun C Murthy
2011-06-15, 18:37
Eli Collins
2011-06-15, 22:25
Chris Douglas
2011-06-15, 22:42
Eli Collins
2011-06-15, 23:11
Eli Collins
2011-06-15, 01:15
Konstantin Boudnik
2011-06-15, 02:32
Chris Douglas
2011-06-15, 02:16
Doug Cutting
2011-06-16, 08:44
Shane Curcuru
2011-06-18, 14:45
Owen O'Malley
2011-06-24, 08:26
Doug Cutting
2011-06-24, 13:43
Owen O'Malley
2011-06-24, 17:07
Doug Cutting
2011-06-25, 05:08
Roy T. Fielding
2011-06-26, 19:11
Eric Baldeschwieler
2011-06-22, 15:41
|
-
[VOTE] Shall we adopt the "Defining Hadoop" pageOwen O'Malley 2011-06-14, 22:56
All,
Steve Loughran has done some great work on defining what can be called Hadoop at http://wiki.apache.org/hadoop/Defining%20Hadoop. After some cleanup from Noirin and Shane, I think we've got a really good base. I'd like a vote to approve the content (at the current revision 12) and put the content on our web site. Clearly, I'm +1. -- Owen +
Owen O'Malley 2011-06-14, 22:56
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageAllen Wittenauer 2011-06-14, 23:35
On Jun 14, 2011, at 3:56 PM, Owen O'Malley wrote: > All, > Steve Loughran has done some great work on defining what can be called Hadoop at http://wiki.apache.org/hadoop/Defining%20Hadoop. After some cleanup from Noirin and Shane, I think we've got a really good base. I'd like a vote to approve the content (at the current revision 12) and put the content on our web site. > > Clearly, I'm +1. This is awesome. Good job everyone! A minor nit: I'd like to see some cleanup between the first paragraph and the fourth paragraph in compatibility. Or was the re-iteration of our "not a standards committee" intentional? It is sort of awkward as it is currently written. Also, where can I download Camshaft? +
Allen Wittenauer 2011-06-14, 23:35
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageSteve Loughran 2011-06-15, 09:58
On 15/06/11 00:35, Allen Wittenauer wrote:
> A minor nit: I'd like to see some cleanup between the first paragraph and the fourth paragraph in compatibility. Or was the re-iteration of our "not a standards committee" intentional? It is sort of awkward as it is currently written. well it is a wiki... > > Also, where can I download Camshaft? It's a fork of Hadoop 0.15 optimised for Windows ME and FAT32 that requires a human to fetch blocks from remote machines using a floppy - a process that limits blocksize to 1.44MB and kills your latency. You don't really want it. What you saw on the page was marketing's spin on the harsh truth. +
Steve Loughran 2011-06-15, 09:58
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageIan Holsman 2011-06-14, 23:38
+1.
great job Steve! On Jun 15, 2011, at 8:56 AM, Owen O'Malley wrote: > All, > Steve Loughran has done some great work on defining what can be called Hadoop at http://wiki.apache.org/hadoop/Defining%20Hadoop. After some cleanup from Noirin and Shane, I think we've got a really good base. I'd like a vote to approve the content (at the current revision 12) and put the content on our web site. > > Clearly, I'm +1. > > -- Owen -- Ian Holsman [EMAIL PROTECTED] PH: +1-703 879-3128 AOLIM: ianholsman Skype:iholsman Never explain - your friends do not need it and your enemies will not believe you anyway. Elbert Hubbard +
Ian Holsman 2011-06-14, 23:38
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageKonstantin Boudnik 2011-06-15, 00:16
+1 - makes sense!
-- Take care, Konstantin (Cos) Boudnik 2CAC 8312 4870 D885 8616 6115 220F 6980 1F27 E622 Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any company the author might be affiliated with at the moment of writing. On Tue, Jun 14, 2011 at 15:56, Owen O'Malley <[EMAIL PROTECTED]> wrote: > All, > Steve Loughran has done some great work on defining what can be called Hadoop at http://wiki.apache.org/hadoop/Defining%20Hadoop. After some cleanup from Noirin and Shane, I think we've got a really good base. I'd like a vote to approve the content (at the current revision 12) and put the content on our web site. > > Clearly, I'm +1. > > -- Owen +
Konstantin Boudnik 2011-06-15, 00:16
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageEli Collins 2011-06-15, 00:48
On Tue, Jun 14, 2011 at 3:56 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
> All, > Steve Loughran has done some great work on defining what can be called Hadoop at http://wiki.apache.org/hadoop/Defining%20Hadoop. After some cleanup from Noirin and Shane, I think we've got a really good base. I'd like a vote to approve the content (at the current revision 12) and put the content on our web site. > > Clearly, I'm +1. > > -- Owen Thanks for putting this together Steve, good stuff! Wrt derivative works, it's not clear from the document, but I think we should explicitly adopt the policy of HTTPD and Subversion that backported patches from trunk and security fixes are permitted. Specifically, that cherry-picking changes from trunk or release branches and, in general, any code that's been subject to lazy consensus approval by the PMC does not make you a derivative work. For example, RedHat backports [1] to Apache HTTP and of course still calls it Apache HTTP. In short, an Apache Hadoop release with a backport of PMC approved code or critical security fix is not powered by Hadoop, it is Hadoop, while a new product that contains or runs atop Hadoop is powered by Hadoop. Reasonable? Thanks, Eli 1. https://access.redhat.com/security/updates/backporting/?sc_cid=3093 +
Eli Collins 2011-06-15, 00:48
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageAllen Wittenauer 2011-06-15, 01:15
On Jun 14, 2011, at 5:48 PM, Eli Collins wrote: > In short, an Apache Hadoop release with a backport of PMC approved > code or critical security fix is not powered by Hadoop, it is Hadoop, > while a new product that contains or runs atop Hadoop is powered by > Hadoop. > > Reasonable? I'd say: Security, yes. Features, no. The reason I say this is because there have been many, many, many posts in the -user mailing lists where people are confused as to what versions have what features because their local branch has a back ported fix. [I think I run out of fingers if I count how many times just the mapred.map.child.java.opts was said to be "in 20" prior to the 0.20.203 release...] This also adds pressure to do timely releases. :) +
Allen Wittenauer 2011-06-15, 01:15
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageEli Collins 2011-06-15, 01:45
On Tue, Jun 14, 2011 at 6:15 PM, Allen Wittenauer <[EMAIL PROTECTED]> wrote:
> > On Jun 14, 2011, at 5:48 PM, Eli Collins wrote: >> In short, an Apache Hadoop release with a backport of PMC approved >> code or critical security fix is not powered by Hadoop, it is Hadoop, >> while a new product that contains or runs atop Hadoop is powered by >> Hadoop. >> >> Reasonable? > > I'd say: Security, yes. Features, no. > > The reason I say this is because there have been many, many, many posts in the -user mailing lists where people are confused as to what versions have what features because their local branch has a back ported fix. [I think I run out of fingers if I count how many times just the mapred.map.child.java.opts was said to be "in 20" prior to the 0.20.203 release...] > > This also adds pressure to do timely releases. :) > I agree this is a problem, I don't think this is an effective means of solving it. Are we really going to go after all the web companies that patch in an enhancement to their current Hadoop build and tell them to stop saying that they are using Hadoop? You've patched Hadoop many times, should your employer not be able to say they use Hadoop? I'm -1 on a proposal that does this. Thanks, Eli +
Eli Collins 2011-06-15, 01:45
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageAllen Wittenauer 2011-06-15, 02:46
On Jun 14, 2011, at 6:45 PM, Eli Collins wrote: > Are we really going to go after all the web companies that patch in an > enhancement to their current Hadoop build and tell them to stop saying > that they are using Hadoop? You've patched Hadoop many times, should > your employer not be able to say they use Hadoop? I'm -1 on a > proposal that does this. I think there is a big difference between some company that uses Hadoop with some patches internally and a company that puts out a distribution for others to use, usually for-profit. +
Allen Wittenauer 2011-06-15, 02:46
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageKonstantin Boudnik 2011-06-15, 02:51
On Tue, Jun 14, 2011 at 19:46, Allen Wittenauer <[EMAIL PROTECTED]> wrote:
> > On Jun 14, 2011, at 6:45 PM, Eli Collins wrote: >> Are we really going to go after all the web companies that patch in an >> enhancement to their current Hadoop build and tell them to stop saying >> that they are using Hadoop? You've patched Hadoop many times, should >> your employer not be able to say they use Hadoop? I'm -1 on a >> proposal that does this. > > I think there is a big difference between some company that uses Hadoop with some patches internally and a company that puts out a distribution for others to use, usually for-profit. Just as the reminder: this whole conversation has started as a result of EMC announcement of 100% compatible version of Apache Hadoop. So, Allen's point is right on target here: the above example is simply incorrect. Cos +
Konstantin Boudnik 2011-06-15, 02:51
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageSteve Loughran 2011-06-15, 09:52
On 15/06/11 03:51, Konstantin Boudnik wrote:
> On Tue, Jun 14, 2011 at 19:46, Allen Wittenauer<[EMAIL PROTECTED]> wrote: >> >> On Jun 14, 2011, at 6:45 PM, Eli Collins wrote: >>> Are we really going to go after all the web companies that patch in an >>> enhancement to their current Hadoop build and tell them to stop saying >>> that they are using Hadoop? You've patched Hadoop many times, should >>> your employer not be able to say they use Hadoop? I'm -1 on a >>> proposal that does this. >> >> I think there is a big difference between some company that uses Hadoop with some patches internally and a company that puts out a distribution for others to use, usually for-profit. > > Just as the reminder: this whole conversation has started as a result > of EMC announcement of 100% compatible version of Apache Hadoop. So, > Allen's point is right on target here: the above example is simply > incorrect. I seem to recall this dicussion starting a bit earlier, with the whole notion of compatibility, before EMC got involved. Regarding the vote, I think the discussion here is interesting and should be finalised before the vote. It's worth resolving the issues. also: banners, stickers and clothing? Can I have T-shirts saying "I broke the hadoop build" with the logo on, or should it be "I broke the Apache Hadoop build"? +
Steve Loughran 2011-06-15, 09:52
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageKonstantin Boudnik 2011-06-15, 15:58
On Wed, Jun 15, 2011 at 02:52, Steve Loughran <[EMAIL PROTECTED]> wrote:
> On 15/06/11 03:51, Konstantin Boudnik wrote: >> >> On Tue, Jun 14, 2011 at 19:46, Allen Wittenauer<[EMAIL PROTECTED]> wrote: >>> >>> On Jun 14, 2011, at 6:45 PM, Eli Collins wrote: >>>> >>>> Are we really going to go after all the web companies that patch in an >>>> enhancement to their current Hadoop build and tell them to stop saying >>>> that they are using Hadoop? You've patched Hadoop many times, should >>>> your employer not be able to say they use Hadoop? I'm -1 on a >>>> proposal that does this. >>> >>> I think there is a big difference between some company that uses >>> Hadoop with some patches internally and a company that puts out a >>> distribution for others to use, usually for-profit. >> >> Just as the reminder: this whole conversation has started as a result >> of EMC announcement of 100% compatible version of Apache Hadoop. So, >> Allen's point is right on target here: the above example is simply >> incorrect. > > I seem to recall this dicussion starting a bit earlier, with the whole > notion of compatibility, before EMC got involved. > > Regarding the vote, I think the discussion here is interesting and should be > finalised before the vote. It's worth resolving the issues. > > also: banners, stickers and clothing? Can I have T-shirts saying "I broke > the hadoop build" with the logo on, or should it be "I broke the Apache > Hadoop build"? I think such a T-shirt should be forcefully worn on any person who did just that. +
Konstantin Boudnik 2011-06-15, 15:58
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageSteve Loughran 2011-06-17, 11:01
On 15/06/11 16:58, Konstantin Boudnik wrote:
> On Wed, Jun 15, 2011 at 02:52, Steve Loughran<[EMAIL PROTECTED]> wrote: >> >> Regarding the vote, I think the discussion here is interesting and should be >> finalised before the vote. It's worth resolving the issues. >> >> also: banners, stickers and clothing? Can I have T-shirts saying "I broke >> the hadoop build" with the logo on, or should it be "I broke the Apache >> Hadoop build"? > > I think such a T-shirt should be forcefully worn on any person who did > just that. Here you go with the poster: http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/doc/breaking_the_hadoop_build.odp?revision=8630 I can add it to hadoop-common SVN for people to work on... +
Steve Loughran 2011-06-17, 11:01
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageKonstantin Boudnik 2011-06-17, 18:17
On Fri, Jun 17, 2011 at 12:01PM, Steve Loughran wrote:
> On 15/06/11 16:58, Konstantin Boudnik wrote: >> On Wed, Jun 15, 2011 at 02:52, Steve Loughran<[EMAIL PROTECTED]> wrote: > >>> >>> Regarding the vote, I think the discussion here is interesting and should be >>> finalised before the vote. It's worth resolving the issues. >>> >>> also: banners, stickers and clothing? Can I have T-shirts saying "I broke >>> the hadoop build" with the logo on, or should it be "I broke the Apache >>> Hadoop build"? >> >> I think such a T-shirt should be forcefully worn on any person who did >> just that. > > Here you go with the poster: > http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/doc/breaking_the_hadoop_build.odp?revision=8630 > > I can add it to hadoop-common SVN for people to work on... Please do, by all means :) ! +
Konstantin Boudnik 2011-06-17, 18:17
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageSteve Loughran 2011-06-20, 12:43
On 17/06/2011 19:17, Konstantin Boudnik wrote:
> On Fri, Jun 17, 2011 at 12:01PM, Steve Loughran wrote: >> On 15/06/11 16:58, Konstantin Boudnik wrote: >>> On Wed, Jun 15, 2011 at 02:52, Steve Loughran<[EMAIL PROTECTED]> wrote: >>>> also: banners, stickers and clothing? Can I have T-shirts saying "I broke >>>> the hadoop build" with the logo on, or should it be "I broke the Apache >>>> Hadoop build"? >>> >>> I think such a T-shirt should be forcefully worn on any person who did >>> just that. >> >> Here you go with the poster: >> http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/doc/breaking_the_hadoop_build.odp?revision=8630 >> >> I can add it to hadoop-common SVN for people to work on... > > Please do, by all means :) ! https://issues.apache.org/jira/browse/HADOOP-7406 now, what happens if it gets checked in in a way that breaks the build? That would be too much. +
Steve Loughran 2011-06-20, 12:43
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageKonstantin Boudnik 2011-06-17, 18:12
On Fri, Jun 17, 2011 at 12:01PM, Steve Loughran wrote:
> On 15/06/11 16:58, Konstantin Boudnik wrote: >> On Wed, Jun 15, 2011 at 02:52, Steve Loughran<[EMAIL PROTECTED]> wrote: > >>> >>> Regarding the vote, I think the discussion here is interesting and should be >>> finalised before the vote. It's worth resolving the issues. >>> >>> also: banners, stickers and clothing? Can I have T-shirts saying "I broke >>> the hadoop build" with the logo on, or should it be "I broke the Apache >>> Hadoop build"? >> >> I think such a T-shirt should be forcefully worn on any person who did >> just that. > > Here you go with the poster: > http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/doc/breaking_the_hadoop_build.odp?revision=8630 > > I can add it to hadoop-common SVN for people to work on... Please do :)! +
Konstantin Boudnik 2011-06-17, 18:12
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageEli Collins 2011-06-15, 16:23
On Tue, Jun 14, 2011 at 7:46 PM, Allen Wittenauer <[EMAIL PROTECTED]> wrote:
> > On Jun 14, 2011, at 6:45 PM, Eli Collins wrote: >> Are we really going to go after all the web companies that patch in an >> enhancement to their current Hadoop build and tell them to stop saying >> that they are using Hadoop? You've patched Hadoop many times, should >> your employer not be able to say they use Hadoop? I'm -1 on a >> proposal that does this. > > I think there is a big difference between some company that uses Hadoop with some patches internally and a company that puts out a distribution for others to use, usually for-profit. The wiki makes no such distinction. The PMC will apply the rules equally to all parties. According to Owen's email if you are using a release of Apache Hadoop and have applied more than 2 security patches or any backports you are not using Hadoop. Thanks, Eli +
Eli Collins 2011-06-15, 16:23
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageSteve Loughran 2011-06-15, 16:44
On 15/06/11 17:23, Eli Collins wrote:
> On Tue, Jun 14, 2011 at 7:46 PM, Allen Wittenauer<[EMAIL PROTECTED]> wrote: >> >> On Jun 14, 2011, at 6:45 PM, Eli Collins wrote: >>> Are we really going to go after all the web companies that patch in an >>> enhancement to their current Hadoop build and tell them to stop saying >>> that they are using Hadoop? You've patched Hadoop many times, should >>> your employer not be able to say they use Hadoop? I'm -1 on a >>> proposal that does this. >> >> I think there is a big difference between some company that uses Hadoop with some patches internally and a company that puts out a distribution for others to use, usually for-profit. > > The wiki makes no such distinction. The PMC will apply the rules > equally to all parties. > > According to Owen's email if you are using a release of Apache Hadoop > and have applied more than 2 security patches or any backports you are > not using Hadoop. > > Thanks, > Eli What you do in house is of no concern to the trademarks and PMC people, but naming of public redistributables is -and that's where the confusion of what "a distribution of Apache Hadoop" is, because it's gone from weakly defined to very vague recently, and that needs to be corrected before people are left in a world of confusion. It's been complicated enough with people posting issues related to the Cloudera Distribution including Apache Hadoop, what happens when people start posting EMC-enterprise-hadoopish issues, file bugreps against Brisk's "Hadoop built on other things" product on the apache JIRA? +
Steve Loughran 2011-06-15, 16:44
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageEli Collins 2011-06-15, 16:57
On Wed, Jun 15, 2011 at 9:44 AM, Steve Loughran <[EMAIL PROTECTED]> wrote:
> On 15/06/11 17:23, Eli Collins wrote: >> >> On Tue, Jun 14, 2011 at 7:46 PM, Allen Wittenauer<[EMAIL PROTECTED]> wrote: >>> >>> On Jun 14, 2011, at 6:45 PM, Eli Collins wrote: >>>> >>>> Are we really going to go after all the web companies that patch in an >>>> enhancement to their current Hadoop build and tell them to stop saying >>>> that they are using Hadoop? You've patched Hadoop many times, should >>>> your employer not be able to say they use Hadoop? I'm -1 on a >>>> proposal that does this. >>> >>> I think there is a big difference between some company that uses >>> Hadoop with some patches internally and a company that puts out a >>> distribution for others to use, usually for-profit. >> >> The wiki makes no such distinction. The PMC will apply the rules >> equally to all parties. >> >> According to Owen's email if you are using a release of Apache Hadoop >> and have applied more than 2 security patches or any backports you are >> not using Hadoop. >> >> Thanks, >> Eli > > What you do in house is of no concern to the trademarks and PMC people, but > naming of public redistributables is -and that's where the confusion of what > "a distribution of Apache Hadoop" is, because it's gone from weakly defined > to very vague recently, and that needs to be corrected before people are > left in a world of confusion. > Steve, I'm on the PMC and it is a concern. What happens in house often gets released on github, documented, blogged about, etc. All this stuff creates confusion about the product and is therefore a concern of the PMC. > > It's been complicated enough with people posting issues related to the > Cloudera Distribution including Apache Hadoop, what happens when people > start posting EMC-enterprise-hadoopish issues, file bugreps against Brisk's > "Hadoop built on other things" product on the apache JIRA? > The same thing we do today. We point them to another more appropriate forum. Isn't your proposal w/ the HTTPD/Subversion policy wrt backporting effective? Note that it's pretty strict, you have to get your code committed to a branch that will be released subject to approval by the PMC. It's not saying that anyone can do whatever they want to the Hadoop source and call it Hadoop. Thanks, Eli +
Eli Collins 2011-06-15, 16:57
-
RE: [VOTE] Shall we adopt the "Defining Hadoop" pageRottinghuis, Joep 2011-06-16, 04:24
It does make sense to me to distinguish between the case when a company seeks to benefit from using the Hadoop name for their product and the case when a company uses Hadoop internally with some minor patches.
For example: large company creates a game-show playing appliance and explains that they have used Hadoop for some of the learning tasks. Not allowed if they applied more than 3 patches? Or: company claims they have a large Hadoop deployment and are looking for developers to help them with their Hadoop development work is not allowed? What's the alternative? Wanted: Powered by Apache™ Hadoop™ developers? Also, if thousands of changes are packaged together into one giant patch, is that allowed? Perhaps a similarity index (such as used by Git to determine if two files are similar enough to be considered a rename) would make sense? If 98% of the code is the same, would it be Hadoop if used internally and not sold/marketted as a product? Cheers, Joep ________________________________________ From: Eli Collins [[EMAIL PROTECTED]] Sent: Wednesday, June 15, 2011 9:23 AM To: [EMAIL PROTECTED] Cc: Apache Brand Management Subject: Re: [VOTE] Shall we adopt the "Defining Hadoop" page On Tue, Jun 14, 2011 at 7:46 PM, Allen Wittenauer <[EMAIL PROTECTED]> wrote: > > On Jun 14, 2011, at 6:45 PM, Eli Collins wrote: >> Are we really going to go after all the web companies that patch in an >> enhancement to their current Hadoop build and tell them to stop saying >> that they are using Hadoop? You've patched Hadoop many times, should >> your employer not be able to say they use Hadoop? I'm -1 on a >> proposal that does this. > > I think there is a big difference between some company that uses Hadoop with some patches internally and a company that puts out a distribution for others to use, usually for-profit. The wiki makes no such distinction. The PMC will apply the rules equally to all parties. According to Owen's email if you are using a release of Apache Hadoop and have applied more than 2 security patches or any backports you are not using Hadoop. Thanks, Eli +
Rottinghuis, Joep 2011-06-16, 04:24
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageOwen O'Malley 2011-06-16, 14:48
On Wed, Jun 15, 2011 at 9:24 PM, Rottinghuis, Joep <[EMAIL PROTECTED]>wrote:
It does make sense to me to distinguish between the case when a company > seeks to benefit from using the Hadoop name for their product and the case > when a company uses Hadoop internally with some minor patches. > If they aren't distributing the version that they use, no one will know or care if they have patches applied. Eli is just trying to cloud the real issue, which is about distributors and what they call their derivative works. For example: large company creates a game-show playing appliance and > explains that they have used Hadoop for some of the learning tasks. Not > allowed if they applied more than 3 patches? > Of course it is allowed. It is only a question of whether you can distribute it to others and call it Hadoop. > Also, if thousands of changes are packaged together into one giant patch, > is that allowed? > No, the exception is strictly for critical security fixes and I would sincerely hope that those would be released by Apache in very short order. -- Owen +
Owen O'Malley 2011-06-16, 14:48
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageEli Collins 2011-06-16, 15:31
On Thu, Jun 16, 2011 at 7:48 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
> On Wed, Jun 15, 2011 at 9:24 PM, Rottinghuis, Joep <[EMAIL PROTECTED]>wrote: > > It does make sense to me to distinguish between the case when a company >> seeks to benefit from using the Hadoop name for their product and the case >> when a company uses Hadoop internally with some minor patches. >> > > If they aren't distributing the version that they use, no one will know or > care if they have patches applied. Eli is just trying to cloud the real > issue, which is about distributors and what they call > their derivative works. > I truly don't see distribution as the relevant issue, in particular I don't see why the definition of what Hadoop should change on whether or not you distribute it. > For example: large company creates a game-show playing appliance and >> explains that they have used Hadoop for some of the learning tasks. Not >> allowed if they applied more than 3 patches? >> > > Of course it is allowed. It is only a question of whether you can distribute > it to others and call it Hadoop. > So you want IBM to call what they run Hadoop, unless they put it up on a website in which case they can no longer call it Hadoop. What is the rationale? Thanks, Eli +
Eli Collins 2011-06-16, 15:31
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageSteve Loughran 2011-06-15, 09:49
On 15/06/11 02:15, Allen Wittenauer wrote:
> > I run out of fingers if I count how many times just the mapred.map.child.java.opts was said to be "in 20" prior to the 0.20.203 release...] yeah, that incident involving Camshaft 3.02 beta and your left hand really reduced your counting ability. +
Steve Loughran 2011-06-15, 09:49
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageOwen O'Malley 2011-06-15, 02:45
On Jun 14, 2011, at 5:48 PM, Eli Collins wrote: > Wrt derivative works, it's not clear from the document, but I think we > should explicitly adopt the policy of HTTPD and Subversion that > backported patches from trunk and security fixes are permitted. Actually, the document is extremely clear that only Apache releases may be called Hadoop. There was a very long thread about why the rapidly expanding Hadoop-ecosystem is leading to at lot of customer confusion about the different "versions" of Hadoop. We as the Hadoop project don't have the resources or the necessary compatibility test suite to test compatibility between the different sets of cherry picked patches. We also don't have time to ensure that all of the 1,000's of patches applied to 0.20.2 in each of the many (10? 15?) different versions have been committed to trunk. Futhermore, under the Apache license, a company Foo could claim that it is a cherry pick version of Hadoop without releasing their source code that would enable verification. In summary, 1. Hadoop is very successful. 2. There are many different commercial products that are trying to use the Hadoop name. 3. We can't check or enforce that the cherry pick versions are following the rules. 4. We don't have a TCK like Java does to validate new versions are compatible. 5. By far the most fair way to ensure compatibility and fairness between companies is that only Apache Hadoop releases may be called Hadoop. That said, a package that includes a small number (< 3) of security patches that haven't been released yet doesn't seem unreasonable. -- Owen +
Owen O'Malley 2011-06-15, 02:45
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageEli Collins 2011-06-15, 16:40
On Tue, Jun 14, 2011 at 7:45 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
> > On Jun 14, 2011, at 5:48 PM, Eli Collins wrote: > >> Wrt derivative works, it's not clear from the document, but I think we >> should explicitly adopt the policy of HTTPD and Subversion that >> backported patches from trunk and security fixes are permitted. > > Actually, the document is extremely clear that only Apache releases may be called Hadoop. > > There was a very long thread about why the rapidly expanding Hadoop-ecosystem is leading to at lot of customer confusion about the different "versions" of Hadoop. We as the Hadoop project don't have the resources or the necessary compatibility test suite to test compatibility between the different sets of cherry picked patches. We also don't have time to ensure that all of the 1,000's of patches applied to 0.20.2 in each of the many (10? 15?) different versions have been committed to trunk. Futhermore, under the Apache license, a company Foo could claim that it is a cherry pick version of Hadoop without releasing their source code that would enable verification. > > In summary, > 1. Hadoop is very successful. > 2. There are many different commercial products that are trying to use the Hadoop name. > 3. We can't check or enforce that the cherry pick versions are following the rules. > 4. We don't have a TCK like Java does to validate new versions are compatible. > 5. By far the most fair way to ensure compatibility and fairness between companies is that only Apache Hadoop releases may be called Hadoop. > > That said, a package that includes a small number (< 3) of security patches that haven't been released yet doesn't seem unreasonable. > I've spoken with ops teams at many companies, I am not aware of anyone who runs an official release (with just 2 security patches). By this definition many of the most valuable contributors to Hadoop, including Yahoo!, Cloudera, Facebook, etc are not using Hadoop. Is that really the message we want to send? We expect the PMC to enforce this equally across all parties? It's a fact of life that companies and ops teams that support Hadoop need to patch the software before the PMC has time and/or will to vote on new releases. This is why HTTP and Subversion allow this. Putting a build of Hadoop that has 4 security patches applied into the same category as a product that has entirely re-worked the code and not gotten it checked into trunk does a major disservice to the people who contribute to and invest in the project. Thanks, Eli +
Eli Collins 2011-06-15, 16:40
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageMatthew Foley 2011-06-15, 17:44
Eli, you said:
> Putting a build of Hadoop that has 4 security patches applied into the same > category as a product that has entirely re-worked the code and not > gotten it checked into trunk does a major disservice to the people who > contribute to and invest in the project. How would you phrase the distinction, so that it is clear and reasonably unambiguous for people who are not Hadoop developers? Do the HTTP and Subversion policies draw this distinction, and if so could you please point us at the specific text, or copy that text to this thread? Thanks, --Matt On Jun 15, 2011, at 9:40 AM, Eli Collins wrote: On Tue, Jun 14, 2011 at 7:45 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > > On Jun 14, 2011, at 5:48 PM, Eli Collins wrote: > >> Wrt derivative works, it's not clear from the document, but I think we >> should explicitly adopt the policy of HTTPD and Subversion that >> backported patches from trunk and security fixes are permitted. > > Actually, the document is extremely clear that only Apache releases may be called Hadoop. > > There was a very long thread about why the rapidly expanding Hadoop-ecosystem is leading to at lot of customer confusion about the different "versions" of Hadoop. We as the Hadoop project don't have the resources or the necessary compatibility test suite to test compatibility between the different sets of cherry picked patches. We also don't have time to ensure that all of the 1,000's of patches applied to 0.20.2 in each of the many (10? 15?) different versions have been committed to trunk. Futhermore, under the Apache license, a company Foo could claim that it is a cherry pick version of Hadoop without releasing their source code that would enable verification. > > In summary, > 1. Hadoop is very successful. > 2. There are many different commercial products that are trying to use the Hadoop name. > 3. We can't check or enforce that the cherry pick versions are following the rules. > 4. We don't have a TCK like Java does to validate new versions are compatible. > 5. By far the most fair way to ensure compatibility and fairness between companies is that only Apache Hadoop releases may be called Hadoop. > > That said, a package that includes a small number (< 3) of security patches that haven't been released yet doesn't seem unreasonable. > I've spoken with ops teams at many companies, I am not aware of anyone who runs an official release (with just 2 security patches). By this definition many of the most valuable contributors to Hadoop, including Yahoo!, Cloudera, Facebook, etc are not using Hadoop. Is that really the message we want to send? We expect the PMC to enforce this equally across all parties? It's a fact of life that companies and ops teams that support Hadoop need to patch the software before the PMC has time and/or will to vote on new releases. This is why HTTP and Subversion allow this. Putting a build of Hadoop that has 4 security patches applied into the same category as a product that has entirely re-worked the code and not gotten it checked into trunk does a major disservice to the people who contribute to and invest in the project. Thanks, Eli +
Matthew Foley 2011-06-15, 17:44
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageMatthew Foley 2011-06-15, 18:00
Oh, and while I can't officially vote, I think this page is extremely well done and
I strongly support it. As an editorial note, however, I would remove the last paragraph in the "Compatibility" section, referencing the email thread (that I contributed to at length :-) ). That thread went all over the place, and would be misinforming to the typical reader. The distillation on this twiki page IS normative and not confusing, and we should leave it at that. Best, --Matt On Jun 15, 2011, at 10:44 AM, Matthew Foley wrote: Eli, you said: > Putting a build of Hadoop that has 4 security patches applied into the same > category as a product that has entirely re-worked the code and not > gotten it checked into trunk does a major disservice to the people who > contribute to and invest in the project. How would you phrase the distinction, so that it is clear and reasonably unambiguous for people who are not Hadoop developers? Do the HTTP and Subversion policies draw this distinction, and if so could you please point us at the specific text, or copy that text to this thread? Thanks, --Matt On Jun 15, 2011, at 9:40 AM, Eli Collins wrote: On Tue, Jun 14, 2011 at 7:45 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > > On Jun 14, 2011, at 5:48 PM, Eli Collins wrote: > >> Wrt derivative works, it's not clear from the document, but I think we >> should explicitly adopt the policy of HTTPD and Subversion that >> backported patches from trunk and security fixes are permitted. > > Actually, the document is extremely clear that only Apache releases may be called Hadoop. > > There was a very long thread about why the rapidly expanding Hadoop-ecosystem is leading to at lot of customer confusion about the different "versions" of Hadoop. We as the Hadoop project don't have the resources or the necessary compatibility test suite to test compatibility between the different sets of cherry picked patches. We also don't have time to ensure that all of the 1,000's of patches applied to 0.20.2 in each of the many (10? 15?) different versions have been committed to trunk. Futhermore, under the Apache license, a company Foo could claim that it is a cherry pick version of Hadoop without releasing their source code that would enable verification. > > In summary, > 1. Hadoop is very successful. > 2. There are many different commercial products that are trying to use the Hadoop name. > 3. We can't check or enforce that the cherry pick versions are following the rules. > 4. We don't have a TCK like Java does to validate new versions are compatible. > 5. By far the most fair way to ensure compatibility and fairness between companies is that only Apache Hadoop releases may be called Hadoop. > > That said, a package that includes a small number (< 3) of security patches that haven't been released yet doesn't seem unreasonable. > I've spoken with ops teams at many companies, I am not aware of anyone who runs an official release (with just 2 security patches). By this definition many of the most valuable contributors to Hadoop, including Yahoo!, Cloudera, Facebook, etc are not using Hadoop. Is that really the message we want to send? We expect the PMC to enforce this equally across all parties? It's a fact of life that companies and ops teams that support Hadoop need to patch the software before the PMC has time and/or will to vote on new releases. This is why HTTP and Subversion allow this. Putting a build of Hadoop that has 4 security patches applied into the same category as a product that has entirely re-worked the code and not gotten it checked into trunk does a major disservice to the people who contribute to and invest in the project. Thanks, Eli +
Matthew Foley 2011-06-15, 18:00
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageEli Collins 2011-06-16, 01:02
On Wed, Jun 15, 2011 at 10:44 AM, Matthew Foley <[EMAIL PROTECTED]> wrote:
> Eli, you said: >> Putting a build of Hadoop that has 4 security patches applied into the same >> category as a product that has entirely re-worked the code and not >> gotten it checked into trunk does a major disservice to the people who >> contribute to and invest in the project. > > How would you phrase the distinction, so that it is clear and reasonably unambiguous > for people who are not Hadoop developers? Do the HTTP and Subversion policies > draw this distinction, and if so could you please point us at the specific text, or > copy that text to this thread? > I'll try to find it, this was told to me verbally a while back. Maybe Roy can chime in. Since there seems to be some confusion around distribution we should make this explicit. Some people are currently interpreting the guidelines to say that if you patch an Apache Hadoop release yourself then you're still running Apache Hadoop. But if a vendor patches Apache Hadoop for you then you're not running Apache Hadoop. How about if a subcontractor patches Apache Hadoop for you, then is it Apache Hadoop? This isn't sustainable. Thanks, Eli > Thanks, > --Matt > > > On Jun 15, 2011, at 9:40 AM, Eli Collins wrote: > > On Tue, Jun 14, 2011 at 7:45 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: >> >> On Jun 14, 2011, at 5:48 PM, Eli Collins wrote: >> >>> Wrt derivative works, it's not clear from the document, but I think we >>> should explicitly adopt the policy of HTTPD and Subversion that >>> backported patches from trunk and security fixes are permitted. >> >> Actually, the document is extremely clear that only Apache releases may be called Hadoop. >> >> There was a very long thread about why the rapidly expanding Hadoop-ecosystem is leading to at lot of customer confusion about the different "versions" of Hadoop. We as the Hadoop project don't have the resources or the necessary compatibility test suite to test compatibility between the different sets of cherry picked patches. We also don't have time to ensure that all of the 1,000's of patches applied to 0.20.2 in each of the many (10? 15?) different versions have been committed to trunk. Futhermore, under the Apache license, a company Foo could claim that it is a cherry pick version of Hadoop without releasing their source code that would enable verification. >> >> In summary, >> 1. Hadoop is very successful. >> 2. There are many different commercial products that are trying to use the Hadoop name. >> 3. We can't check or enforce that the cherry pick versions are following the rules. >> 4. We don't have a TCK like Java does to validate new versions are compatible. >> 5. By far the most fair way to ensure compatibility and fairness between companies is that only Apache Hadoop releases may be called Hadoop. >> >> That said, a package that includes a small number (< 3) of security patches that haven't been released yet doesn't seem unreasonable. >> > > I've spoken with ops teams at many companies, I am not aware of > anyone who runs an official release (with just 2 security patches). By > this definition many of the most valuable contributors to Hadoop, > including Yahoo!, Cloudera, Facebook, etc are not using Hadoop. Is > that really the message we want to send? We expect the PMC to enforce > this equally across all parties? > > It's a fact of life that companies and ops teams that support Hadoop > need to patch the software before the PMC has time and/or will to vote > on new releases. This is why HTTP and Subversion allow this. Putting a > build of Hadoop that has 4 security patches applied into the same > category as a product that has entirely re-worked the code and not > gotten it checked into trunk does a major disservice to the people who > contribute to and invest in the project. > > Thanks, > Eli > > +
Eli Collins 2011-06-16, 01:02
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageMatthew Foley 2011-06-16, 01:17
I tend to agree with what I think you are saying, that
* applying a small-number-of-patches that are * for high-severity-bug-fixes, and * have been Apache-Hadoop-committed to an Apache Hadoop release should not demote the result to a "derived work". However, if so many patches are applied that the result cannot be meaningfully correlated with a specific Apache Hadoop release, then it probably has become a derived work. But how do we draw a meaningful line across that big gray area? That's why I'd like to see specific text from one of the other projects you cited as an example. Thanks, --Matt On Jun 15, 2011, at 6:02 PM, Eli Collins wrote: On Wed, Jun 15, 2011 at 10:44 AM, Matthew Foley <[EMAIL PROTECTED]> wrote: > Eli, you said: >> Putting a build of Hadoop that has 4 security patches applied into the same >> category as a product that has entirely re-worked the code and not >> gotten it checked into trunk does a major disservice to the people who >> contribute to and invest in the project. > > How would you phrase the distinction, so that it is clear and reasonably unambiguous > for people who are not Hadoop developers? Do the HTTP and Subversion policies > draw this distinction, and if so could you please point us at the specific text, or > copy that text to this thread? > I'll try to find it, this was told to me verbally a while back. Maybe Roy can chime in. Since there seems to be some confusion around distribution we should make this explicit. Some people are currently interpreting the guidelines to say that if you patch an Apache Hadoop release yourself then you're still running Apache Hadoop. But if a vendor patches Apache Hadoop for you then you're not running Apache Hadoop. How about if a subcontractor patches Apache Hadoop for you, then is it Apache Hadoop? This isn't sustainable. Thanks, Eli > Thanks, > --Matt > > > On Jun 15, 2011, at 9:40 AM, Eli Collins wrote: > > On Tue, Jun 14, 2011 at 7:45 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: >> >> On Jun 14, 2011, at 5:48 PM, Eli Collins wrote: >> >>> Wrt derivative works, it's not clear from the document, but I think we >>> should explicitly adopt the policy of HTTPD and Subversion that >>> backported patches from trunk and security fixes are permitted. >> >> Actually, the document is extremely clear that only Apache releases may be called Hadoop. >> >> There was a very long thread about why the rapidly expanding Hadoop-ecosystem is leading to at lot of customer confusion about the different "versions" of Hadoop. We as the Hadoop project don't have the resources or the necessary compatibility test suite to test compatibility between the different sets of cherry picked patches. We also don't have time to ensure that all of the 1,000's of patches applied to 0.20.2 in each of the many (10? 15?) different versions have been committed to trunk. Futhermore, under the Apache license, a company Foo could claim that it is a cherry pick version of Hadoop without releasing their source code that would enable verification. >> >> In summary, >> 1. Hadoop is very successful. >> 2. There are many different commercial products that are trying to use the Hadoop name. >> 3. We can't check or enforce that the cherry pick versions are following the rules. >> 4. We don't have a TCK like Java does to validate new versions are compatible. >> 5. By far the most fair way to ensure compatibility and fairness between companies is that only Apache Hadoop releases may be called Hadoop. >> >> That said, a package that includes a small number (< 3) of security patches that haven't been released yet doesn't seem unreasonable. >> > > I've spoken with ops teams at many companies, I am not aware of > anyone who runs an official release (with just 2 security patches). By > this definition many of the most valuable contributors to Hadoop, > including Yahoo!, Cloudera, Facebook, etc are not using Hadoop. Is > that really the message we want to send? We expect the PMC to enforce +
Matthew Foley 2011-06-16, 01:17
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageCraig L Russell 2011-06-16, 02:19
Hi Matthew,
I'm sorry I have to disagree. If you change a bit in a work, it becomes a derived work. There's no "demotion" involved. Just a definition of derived work. There's no ambiguity. Either you ship the bits that the Apache PMC has voted on as a release, or you change it (one bit) and it is no longer what the PMC has voted on. It's a derived work. The rules for voting in Apache require that if you change a bit in an artifact, you can no longer count votes for the previous artifact. Because the new work is different. A new vote is required. Not gray. Black and white. Simple as that. Craig P.S. for the anthropologists, look at the history of Apache Derby and Sun JavaDB. Meaningful, specific example. On Jun 15, 2011, at 6:17 PM, Matthew Foley wrote: > I tend to agree with what I think you are saying, that > * applying a small-number-of-patches that are > * for high-severity-bug-fixes, and > * have been Apache-Hadoop-committed > to an Apache Hadoop release should not demote the result to a > "derived work". > However, if so many patches are applied that the result cannot be > meaningfully > correlated with a specific Apache Hadoop release, then it probably has > become a derived work. > > But how do we draw a meaningful line across that big gray area? > That's why I'd like to > see specific text from one of the other projects you cited as an > example. > > Thanks, > --Matt > > > On Jun 15, 2011, at 6:02 PM, Eli Collins wrote: > > On Wed, Jun 15, 2011 at 10:44 AM, Matthew Foley <mattf@yahoo- > inc.com> wrote: >> Eli, you said: >>> Putting a build of Hadoop that has 4 security patches applied into >>> the same >>> category as a product that has entirely re-worked the code and not >>> gotten it checked into trunk does a major disservice to the people >>> who >>> contribute to and invest in the project. >> >> How would you phrase the distinction, so that it is clear and >> reasonably unambiguous >> for people who are not Hadoop developers? Do the HTTP and >> Subversion policies >> draw this distinction, and if so could you please point us at the >> specific text, or >> copy that text to this thread? >> > > I'll try to find it, this was told to me verbally a while back. Maybe > Roy can chime in. > > Since there seems to be some confusion around distribution we should > make this explicit. Some people are currently interpreting the > guidelines to say that if you patch an Apache Hadoop release yourself > then you're still running Apache Hadoop. But if a vendor patches > Apache Hadoop for you then you're not running Apache Hadoop. How about > if a subcontractor patches Apache Hadoop for you, then is it Apache > Hadoop? This isn't sustainable. > > Thanks, > Eli >> Thanks, >> --Matt >> >> >> On Jun 15, 2011, at 9:40 AM, Eli Collins wrote: >> >> On Tue, Jun 14, 2011 at 7:45 PM, Owen O'Malley <[EMAIL PROTECTED]> >> wrote: >>> >>> On Jun 14, 2011, at 5:48 PM, Eli Collins wrote: >>> >>>> Wrt derivative works, it's not clear from the document, but I >>>> think we >>>> should explicitly adopt the policy of HTTPD and Subversion that >>>> backported patches from trunk and security fixes are permitted. >>> >>> Actually, the document is extremely clear that only Apache >>> releases may be called Hadoop. >>> >>> There was a very long thread about why the rapidly expanding >>> Hadoop-ecosystem is leading to at lot of customer confusion about >>> the different "versions" of Hadoop. We as the Hadoop project don't >>> have the resources or the necessary compatibility test suite to >>> test compatibility between the different sets of cherry picked >>> patches. We also don't have time to ensure that all of the 1,000's >>> of patches applied to 0.20.2 in each of the many (10? 15?) >>> different versions have been committed to trunk. Futhermore, under >>> the Apache license, a company Foo could claim that it is a cherry >>> pick version of Hadoop without releasing their source code that Craig L Russell Secretary, Apache Software Foundation Chair, OpenJPA PMC [EMAIL PROTECTED] http://db.apache.org/jdo +
Craig L Russell 2011-06-16, 02:19
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageIan Holsman 2011-06-16, 02:52
So to second a point here.
We are not saying you can't patch your distribution, add your own features, share it with your friends, or do whatever you want to the code. all we're saying is that you can't call that 'Apache Hadoop'. On Jun 16, 2011, at 12:19 PM, Craig L Russell wrote: > Hi Matthew, > > I'm sorry I have to disagree. > > If you change a bit in a work, it becomes a derived work. There's no "demotion" involved. Just a definition of derived work. > > There's no ambiguity. Either you ship the bits that the Apache PMC has voted on as a release, or you change it (one bit) and it is no longer what the PMC has voted on. It's a derived work. > > The rules for voting in Apache require that if you change a bit in an artifact, you can no longer count votes for the previous artifact. Because the new work is different. A new vote is required. > > Not gray. Black and white. > > Simple as that. > > Craig > > P.S. for the anthropologists, look at the history of Apache Derby and Sun JavaDB. Meaningful, specific example. > > On Jun 15, 2011, at 6:17 PM, Matthew Foley wrote: > >> I tend to agree with what I think you are saying, that >> * applying a small-number-of-patches that are >> * for high-severity-bug-fixes, and >> * have been Apache-Hadoop-committed >> to an Apache Hadoop release should not demote the result to a "derived work". >> However, if so many patches are applied that the result cannot be meaningfully >> correlated with a specific Apache Hadoop release, then it probably has >> become a derived work. >> >> But how do we draw a meaningful line across that big gray area? That's why I'd like to >> see specific text from one of the other projects you cited as an example. >> >> Thanks, >> --Matt >> >> >> On Jun 15, 2011, at 6:02 PM, Eli Collins wrote: >> >> On Wed, Jun 15, 2011 at 10:44 AM, Matthew Foley <[EMAIL PROTECTED]> wrote: >>> Eli, you said: >>>> Putting a build of Hadoop that has 4 security patches applied into the same >>>> category as a product that has entirely re-worked the code and not >>>> gotten it checked into trunk does a major disservice to the people who >>>> contribute to and invest in the project. >>> >>> How would you phrase the distinction, so that it is clear and reasonably unambiguous >>> for people who are not Hadoop developers? Do the HTTP and Subversion policies >>> draw this distinction, and if so could you please point us at the specific text, or >>> copy that text to this thread? >>> >> >> I'll try to find it, this was told to me verbally a while back. Maybe >> Roy can chime in. >> >> Since there seems to be some confusion around distribution we should >> make this explicit. Some people are currently interpreting the >> guidelines to say that if you patch an Apache Hadoop release yourself >> then you're still running Apache Hadoop. But if a vendor patches >> Apache Hadoop for you then you're not running Apache Hadoop. How about >> if a subcontractor patches Apache Hadoop for you, then is it Apache >> Hadoop? This isn't sustainable. >> >> Thanks, >> Eli >>> Thanks, >>> --Matt >>> >>> >>> On Jun 15, 2011, at 9:40 AM, Eli Collins wrote: >>> >>> On Tue, Jun 14, 2011 at 7:45 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: >>>> >>>> On Jun 14, 2011, at 5:48 PM, Eli Collins wrote: >>>> >>>>> Wrt derivative works, it's not clear from the document, but I think we >>>>> should explicitly adopt the policy of HTTPD and Subversion that >>>>> backported patches from trunk and security fixes are permitted. >>>> >>>> Actually, the document is extremely clear that only Apache releases may be called Hadoop. >>>> >>>> There was a very long thread about why the rapidly expanding Hadoop-ecosystem is leading to at lot of customer confusion about the different "versions" of Hadoop. We as the Hadoop project don't have the resources or the necessary compatibility test suite to test compatibility between the different sets of cherry picked patches. We also don't have time to ensure that all of the 1,000's of patches applied to 0.20.2 in each of the many (10? 15?) different versions have been committed to trunk. Futhermore, under the Apache license, a company Foo could claim that it is a cherry pick version of Hadoop without releasing their source code that would enable verification. Ian Holsman [EMAIL PROTECTED] PH: +1-703 879-3128 AOLIM: ianholsman Skype:iholsman If you can believe in your power to do great things, you will. -- Michael Berg +
Ian Holsman 2011-06-16, 02:52
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageTodd Lipcon 2011-06-16, 04:30
On Wed, Jun 15, 2011 at 7:19 PM, Craig L Russell
<[EMAIL PROTECTED]>wrote: > There's no ambiguity. Either you ship the bits that the Apache PMC has > voted on as a release, or you change it (one bit) and it is no longer what > the PMC has voted on. It's a derived work. > > The rules for voting in Apache require that if you change a bit in an > artifact, you can no longer count votes for the previous artifact. Because > the new work is different. A new vote is required. > Sorry, but this is just silly. Are you telling me that the httpd package in Ubuntu isn't Apache httpd? It has 43 patches applied. Tomcat6 has 17. I'm sure every other commonly used piece of software bundled with ubuntu has been patched, too. I don't see them calling their packages "Ubuntu HTTP server powered by Apache HTTPD". It's just httpd. The httpd in RHEL 5 is the same way. In fact they even provide some nice metadata in their patches, for example: httpd-2.0.48-release.patch:Upstream-Status: vendor-specific change httpd-2.1.10-apctl.patch:Upstream-Status: Vendor-specific changes for better initscript integration To me, this is a good thing: allowing vendors to redistribute the software with some modifications makes it much more accessible to users and businesses alike, and that's part of why Hadoop has had so much success. So long as we require the vendors to upstream those modifications back to the ASF, we get the benefits of these contributions back in the community and everyone should be happy. -Todd -- Todd Lipcon Software Engineer, Cloudera +
Todd Lipcon 2011-06-16, 04:30
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageIan Holsman 2011-06-16, 04:47
On Jun 16, 2011, at 2:30 PM, Todd Lipcon wrote: > On Wed, Jun 15, 2011 at 7:19 PM, Craig L Russell > <[EMAIL PROTECTED]>wrote: > >> There's no ambiguity. Either you ship the bits that the Apache PMC has >> voted on as a release, or you change it (one bit) and it is no longer what >> the PMC has voted on. It's a derived work. >> >> The rules for voting in Apache require that if you change a bit in an >> artifact, you can no longer count votes for the previous artifact. Because >> the new work is different. A new vote is required. >> > > Sorry, but this is just silly. Are you telling me that the httpd package in > Ubuntu isn't Apache httpd? It has 43 patches applied. Tomcat6 has 17. I'm > sure every other commonly used piece of software bundled with ubuntu has > been patched, too. I don't see them calling their packages "Ubuntu HTTP > server powered by Apache HTTPD". It's just httpd. > well.. for RHEL in the early days of httpd, a configuration that ran on RHEL would not work on the 'vanilla' httpd. (they implemented a feature called include which could take a wildcard, which wasn't in the released version of httpd at the time) even today.. I can't build redis on my mac as I am using GNU's libtool instead of the one packaged on the mac. http://code.google.com/p/redis/issues/detail?id=443 so yes .. even a simple patch makes it derived, because it is different. > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera +
Ian Holsman 2011-06-16, 04:47
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageEric Sammer 2011-06-16, 06:35
On Wed, Jun 15, 2011 at 9:47 PM, Ian Holsman <[EMAIL PROTECTED]> wrote:
> > so yes .. even a simple patch makes it derived, because it is different. > ...and a "dervied work" is fine. Nothing inherently wrong with the term derived. I think the question is can one call it "Hadoop?" Note I'm *not* saying "Apache Hadoop," just "Hadoop" when the derived work is actually derived (to any degree, as Craig R pointed out). Apache Hadoop always and forever means the bits voted on by the PMC - no vendor can claim that - but there does appear to be plenty of prior examples of "reasonable" use of ASF (and other OSS organization) project names in clearly derived works. I do agree there should be a policy and it needs to be universally applied to be fair to all involved. Not to kick up the compatibility dust storm again, but people will always claim crazy stuff that may or may not be true. We should just ignore it. Any day of the week someone is claiming XYZ compatible either explicitly or implicitly (as in client libraries for Foo Project). For cases where a vendor makes a claim that isn't true, users will ask, we'll clarify that Apache makes no guarantees of derived work compatibility and doesn't certify anything (and specifically does the opposite - *NO* guarantees or warranties). Example uses I think should be fine / acceptable: YDH (even though it no longer exists, it's a good example) and Y!'s use of Hadoop Facebook Hadoop Hadoop at eBay Hadoop at LinkedIn IBM's use of Hadoop and yes, CDH* Even if some / all of the above modify at least a single bit (and may *technically* be derived works) everyone understands what they mean. As for the confusion, the OSS community has always just said "oh, they patch some stuff, you should probably ask them" when confronted with vendor modified versions of upstream projects; I've been involved in many of those upstream projects, including a Linux distro (downstream). We should always be polite to downstream users in redirecting them, but I think redirecting them is fine. It's not confusing to users in my experience (we can make it a FAQ or something and just point people there) as RedHat, Novell, Oracle, IBM, and many other vendors have been happily[1] coexisting with their upstream counterparts for a long time. I believe we (the collective Apache Hadoop community including those that redistribute Hadoop bits in various forms) should focus on producing regular, quality releases in a cooperative and constructive environment, and continue to require vendors to provide the proper attribution and license information. This is in everyone's interest, vendors and direct users alike. *Disclosure: I work for Cloudera and I think this should apply to anyone and everyone, including my employer (with whom I obviously do not clear emails. :)) [1] OK, maybe not always "happily" but mostly so. You know what I mean. Thanks to Steve L and others for their hard work on this one. (Sorry for the long email.) -- Eric Sammer twitter: esammer data: www.cloudera.com +
Eric Sammer 2011-06-16, 06:35
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageSteve Loughran 2011-06-16, 11:46
On 16/06/11 07:35, Eric Sammer wrote:
> On Wed, Jun 15, 2011 at 9:47 PM, Ian Holsman<[EMAIL PROTECTED]> wrote: > >> >> so yes .. even a simple patch makes it derived, because it is different. >> > > ...and a "dervied work" is fine. Nothing inherently wrong with the term > derived. I think the question is can one call it "Hadoop?" Note I'm *not* > saying "Apache Hadoop," just "Hadoop" when the derived work is actually > derived (to any degree, as Craig R pointed out). Apache Hadoop always and > forever means the bits voted on by the PMC - no vendor can claim that - but > there does appear to be plenty of prior examples of "reasonable" use of ASF > (and other OSS organization) project names in clearly derived works. I do > agree there should be a policy and it needs to be universally applied to be > fair to all involved. > > Not to kick up the compatibility dust storm again, but people will always > claim crazy stuff that may or may not be true. We should just ignore it. The issue is branding and trademarks, eventually things get downgraded to become meaningless. If I code an MR engine in erlang (I have one somewhere), can I call it "Hadoop for Erlang"? > Any > day of the week someone is claiming XYZ compatible either explicitly or > implicitly (as in client libraries for Foo Project). For cases where a > vendor makes a claim that isn't true, users will ask, we'll clarify that > Apache makes no guarantees of derived work compatibility and doesn't certify > anything (and specifically does the opposite - *NO* guarantees or > warranties). -BigTop could provide that defensible compatibility statement. "Automotive Joe's Crankshaft platform passed the Apache BigTop DFS, MR, Mahout and HBase test suites" > > Example uses I think should be fine / acceptable: > > YDH (even though it no longer exists, it's a good example) and Y!'s use of > Hadoop -creates confusion and encourages the notion that anything is a distribution of hadoop, which is the situation that the trademarks people are trying to crack down > Facebook Hadoop -depends on internal vs external > Hadoop at eBay > Hadoop at LinkedIn details of internal use, as valid as "Hadoop in Steve's house", which, given my known network state, is always something to cherish. And while I have built my branch up and published it, it's no longer something I distribute (though it is in an open SVN repository somewhere). I'm working directly with Apache Hadoop 0.20.203 these days. > IBM's use of Hadoop not sure about IBM distribution of Apache Hadoop, as I presume it has the uncommitted patch to work on IBM JVMs (though were someone to commit it..) http://www.alphaworks.ibm.com/tech/idah The biginsights product is more explicit and, to me, a good example of terminology. Their own brand, description of the benefits, and details on what's in there: "IBM InfoSphere BigInsights Enterprise Edition For turning complex, internet-scale information into insight, cost effectively IBM� InfoSphere� BigInsights Enterprise Edition enables new solutions that turn large, complex volumes of data into insight, cost effectively. InfoSphere BigInsights delivers an enterprise-ready big data solution by combining Apache Hadoop, including the MapReduce framework and the Hadoop Distributed File Systems (HDFS), with unique technologies and capabilities from across IBM." That gives them the flexibility to swap things around in future (switch to GPFS, MapR, Brisk) without having to change their branding. > and yes, CDH* If you look a the CDH site its now "Cloudera's Distribution including Apache Hadoop". After all it's Cloudera's data analysis stack including Apache Hadoop, > Even if some / all of the above modify at least a single bit (and may > *technically* be derived works) everyone understands what they mean. As for > the confusion, the OSS community has always just said "oh, they patch some > stuff, you should probably ask them" when confronted with vendor modified co-existence yes; happiness, not always: http://www.jonobacon.org/2010/07/30/red-hat-canonical-and-gnome-contributions/ http://lwn.net/Articles/374737/ http://gburt.blogspot.com/2011/02/banshee-supporting-gnome-on-ubuntu.html http://bazaar.launchpad.net/~mozillateam/firefox/firefox-4.0.head/view/head:/debian/patches/ubuntu-codes-amazon.patch Where ubuntu are good is that launchpad is a good entry point for filing and tracking any ubuntu-related problem, and helping to push that upstream, so the local issue can be linked to the source issue, letting me deal with problems like getting sound to work: https://bugs.launchpad.net/ubuntu/+source/amarok/+bug/523269 https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/584844 JIRA doesn't do that cross-instance tracking, which is painful for me at work, where I do deal with multiple JIRA instances. You can put remote URLs in, but they don't get synchronised. I can't say SFOS-780 depends on apache.org/MAPREDUCE-279, for example. +1 I understand -it may ultimately affect my employer too. Which is why a consistent approach matters, then nobody will feel they are being discriminated against. +
Steve Loughran 2011-06-16, 11:46
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageOwen O'Malley 2011-06-16, 15:02
On Wed, Jun 15, 2011 at 11:35 PM, Eric Sammer <[EMAIL PROTECTED]> wrote:
I think the question is can one call it "Hadoop?" Note I'm *not* > saying "Apache Hadoop," just "Hadoop" when the derived work is actually > derived (to any degree, as Craig R pointed out). Apache Hadoop always and > forever means the bits voted on by the PMC - no vendor can claim that - but > there does appear to be plenty of prior examples of "reasonable" use of ASF > (and other OSS organization) project names in clearly derived works. > Thank you, Eric, for demonstrating why we are fixing it. Apache owns the Hadoop trademark. Hadoop is PRECISELY the same as Apache Hadoop. They are two names for the same thing. If the Hadoop PMC were to fail to enforce that, the Apache board would remove us en masse from the PMC. -- Owen +
Owen O'Malley 2011-06-16, 15:02
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageEli Collins 2011-06-16, 15:41
On Thu, Jun 16, 2011 at 8:02 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
> On Wed, Jun 15, 2011 at 11:35 PM, Eric Sammer <[EMAIL PROTECTED]> wrote: > > I think the question is can one call it "Hadoop?" Note I'm *not* >> saying "Apache Hadoop," just "Hadoop" when the derived work is actually >> derived (to any degree, as Craig R pointed out). Apache Hadoop always and >> forever means the bits voted on by the PMC - no vendor can claim that - but >> there does appear to be plenty of prior examples of "reasonable" use of ASF >> (and other OSS organization) project names in clearly derived works. >> > > Thank you, Eric, for demonstrating why we are fixing it. Apache owns the > Hadoop trademark. Hadoop is PRECISELY the same as Apache Hadoop. They are > two names for the same thing. If the Hadoop PMC were to fail to enforce > that, the Apache board would remove us en masse from the PMC. > By this logic the Apache board should en masse remove the PMC from the HTTP Server, Subversion and Tomcat because they've failed to enforce Red Hat, Novell, Ubuntu and others to stop calling them Apache X. Clearly that hasn't happened. Let's let the Apache board speak for itself. Thanks, Eli +
Eli Collins 2011-06-16, 15:41
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageMatthew Foley 2011-06-16, 17:17
Hi Eric,
sorry, but drawing a distinction between "Hadoop" and "Apache Hadoop" cannot be done, under general trademark usage nor the Apache Trademark Policy. Trademark usage is a specialized language just like a programming language, and that usage violates the intended semantics of the trademark. --Matt On Jun 15, 2011, at 11:35 PM, Eric Sammer wrote: On Wed, Jun 15, 2011 at 9:47 PM, Ian Holsman <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: so yes .. even a simple patch makes it derived, because it is different. ...and a "dervied work" is fine. Nothing inherently wrong with the term derived. I think the question is can one call it "Hadoop?" Note I'm *not* saying "Apache Hadoop," just "Hadoop" when the derived work is actually derived (to any degree, as Craig R pointed out). Apache Hadoop always and forever means the bits voted on by the PMC - no vendor can claim that - but there does appear to be plenty of prior examples of "reasonable" use of ASF (and other OSS organization) project names in clearly derived works. I do agree there should be a policy and it needs to be universally applied to be fair to all involved. Not to kick up the compatibility dust storm again, but people will always claim crazy stuff that may or may not be true. We should just ignore it. Any day of the week someone is claiming XYZ compatible either explicitly or implicitly (as in client libraries for Foo Project). For cases where a vendor makes a claim that isn't true, users will ask, we'll clarify that Apache makes no guarantees of derived work compatibility and doesn't certify anything (and specifically does the opposite - *NO* guarantees or warranties). Example uses I think should be fine / acceptable: YDH (even though it no longer exists, it's a good example) and Y!'s use of Hadoop Facebook Hadoop Hadoop at eBay Hadoop at LinkedIn IBM's use of Hadoop and yes, CDH* Even if some / all of the above modify at least a single bit (and may *technically* be derived works) everyone understands what they mean. As for the confusion, the OSS community has always just said "oh, they patch some stuff, you should probably ask them" when confronted with vendor modified versions of upstream projects; I've been involved in many of those upstream projects, including a Linux distro (downstream). We should always be polite to downstream users in redirecting them, but I think redirecting them is fine. It's not confusing to users in my experience (we can make it a FAQ or something and just point people there) as RedHat, Novell, Oracle, IBM, and many other vendors have been happily[1] coexisting with their upstream counterparts for a long time. I believe we (the collective Apache Hadoop community including those that redistribute Hadoop bits in various forms) should focus on producing regular, quality releases in a cooperative and constructive environment, and continue to require vendors to provide the proper attribution and license information. This is in everyone's interest, vendors and direct users alike. *Disclosure: I work for Cloudera and I think this should apply to anyone and everyone, including my employer (with whom I obviously do not clear emails. :)) [1] OK, maybe not always "happily" but mostly so. You know what I mean. Thanks to Steve L and others for their hard work on this one. (Sorry for the long email.) -- Eric Sammer twitter: esammer data: www.cloudera.com<http://www.cloudera.com/> +
Matthew Foley 2011-06-16, 17:17
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageEli Collins 2011-06-16, 16:05
On Wed, Jun 15, 2011 at 6:17 PM, Matthew Foley <[EMAIL PROTECTED]> wrote:
> I tend to agree with what I think you are saying, that > * applying a small-number-of-patches that are > * for high-severity-bug-fixes, and > * have been Apache-Hadoop-committed > to an Apache Hadoop release should not demote the result to a "derived work". > However, if so many patches are applied that the result cannot be meaningfully > correlated with a specific Apache Hadoop release, then it probably has > become a derived work. > This is one reason why I think the definition of derived work in the draft of the wiki is way too broad. Something that's nothing like Hadoop at all but includes a Hadoop jar is given the same label as something with a single security patch. I think we can come up with a more useful definition of derived work. If we do that would help us draw the distinction between: 1. An Apache Hadoop release voted on the PMC, bit-for-bit identical 2. An Apache Hadoop release + backports (eg say per the above definition of backport) 3. Something that is powered by Hadoop (eg HBase) 4. Something that is not Hadoop nor powered by Hadoop (eg the way tc Server is not powered by Apache Tomcat) Note that the current document does not make an exception for security patches. I and Owen made this suggestion on this thread but the writeup we are voting on makes no such exception. > But how do we draw a meaningful line across that big gray area? That's why I'd like to > see specific text from one of the other projects you cited as an example. > Googling didn't turn up anything in their public archives. This was in an email exchange I had with Shane several years ago. Hopefully their PMC can chime in. Thanks, Eli +
Eli Collins 2011-06-16, 16:05
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageMatthew Foley 2011-06-16, 17:38
After writing my note to Eric, I realize that Eli and I are guilty of the same attempt
to use legal terminology in an engineering context. Craig Russell is absolutely right. If you change one bit, it is a "derived work". However, we can still allow the trademark to be applied to that work, if it meets licensing criteria. So what we are arguing about is, "Where is the boundary line between something we are willing to call 'Apache Hadoop' and something that must be called 'Product XYZ Powered by Apache Hadoop'?" I'm in favor of a very strict definition. It needs to be really, really close to a PMC-approved release. But I'm open to the argument that a small number of security patches could be necessary for a viable commercial product, and that shouldn't necessarily prevent it from using the trademark. But I suggest we stop focusing on the term "derived work". Note that the "Defining Apache Hadoop" draft document we are voting on doesn't use that term. --Matt On Jun 16, 2011, at 9:05 AM, Eli Collins wrote: On Wed, Jun 15, 2011 at 6:17 PM, Matthew Foley <[EMAIL PROTECTED]> wrote: > I tend to agree with what I think you are saying, that > * applying a small-number-of-patches that are > * for high-severity-bug-fixes, and > * have been Apache-Hadoop-committed > to an Apache Hadoop release should not demote the result to a "derived work". > However, if so many patches are applied that the result cannot be meaningfully > correlated with a specific Apache Hadoop release, then it probably has > become a derived work. > This is one reason why I think the definition of derived work in the draft of the wiki is way too broad. Something that's nothing like Hadoop at all but includes a Hadoop jar is given the same label as something with a single security patch. I think we can come up with a more useful definition of derived work. If we do that would help us draw the distinction between: 1. An Apache Hadoop release voted on the PMC, bit-for-bit identical 2. An Apache Hadoop release + backports (eg say per the above definition of backport) 3. Something that is powered by Hadoop (eg HBase) 4. Something that is not Hadoop nor powered by Hadoop (eg the way tc Server is not powered by Apache Tomcat) Note that the current document does not make an exception for security patches. I and Owen made this suggestion on this thread but the writeup we are voting on makes no such exception. > But how do we draw a meaningful line across that big gray area? That's why I'd like to > see specific text from one of the other projects you cited as an example. > Googling didn't turn up anything in their public archives. This was in an email exchange I had with Shane several years ago. Hopefully their PMC can chime in. Thanks, Eli +
Matthew Foley 2011-06-16, 17:38
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageEli Collins 2011-06-16, 18:11
On Thu, Jun 16, 2011 at 10:38 AM, Matthew Foley <[EMAIL PROTECTED]> wrote:
> After writing my note to Eric, I realize that Eli and I are guilty of the same attempt > to use legal terminology in an engineering context. Craig Russell is absolutely right. > If you change one bit, it is a "derived work". > > However, we can still allow the trademark to be applied to that work, if it > meets licensing criteria. So what we are arguing about is, "Where is the boundary > line between something we are willing to call 'Apache Hadoop' and something > that must be called 'Product XYZ Powered by Apache Hadoop'?" > > I'm in favor of a very strict definition. It needs to be really, really close to a > PMC-approved release. But I'm open to the argument that a small number > of security patches could be necessary for a viable commercial product, > and that shouldn't necessarily prevent it from using the trademark. > > But I suggest we stop focusing on the term "derived work". Note that the > "Defining Apache Hadoop" draft document we are voting on doesn't use > that term. See the section titled "Derivative Works". The term "derivative work" is used throughout the document. I think you're right that the key point here is not what is and is not a derivative work, but what can be called Hadoop. Seems like the board should have an ASF-wide stance on what can be called Apache X instead of doing this per-project. Thanks, Eli +
Eli Collins 2011-06-16, 18:11
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageEric Baldeschwieler 2011-06-17, 00:35
If the board does have a stance, I'd love to hear it. That could usefully end this discussion.
Absent that, it seems reasonable for the PMC to make a decision in this area. Each project has different use cases and ecosystems, so it may not be reasonable to expect a one size fits all solution. I see no reason not to make a local proposal, the board can always clarify. On Jun 16, 2011, at 11:11 AM, Eli Collins wrote: > On Thu, Jun 16, 2011 at 10:38 AM, Matthew Foley <[EMAIL PROTECTED]> wrote: >> After writing my note to Eric, I realize that Eli and I are guilty of the same attempt >> to use legal terminology in an engineering context. Craig Russell is absolutely right. >> If you change one bit, it is a "derived work". >> >> However, we can still allow the trademark to be applied to that work, if it >> meets licensing criteria. So what we are arguing about is, "Where is the boundary >> line between something we are willing to call 'Apache Hadoop' and something >> that must be called 'Product XYZ Powered by Apache Hadoop'?" >> >> I'm in favor of a very strict definition. It needs to be really, really close to a >> PMC-approved release. But I'm open to the argument that a small number >> of security patches could be necessary for a viable commercial product, >> and that shouldn't necessarily prevent it from using the trademark. >> >> But I suggest we stop focusing on the term "derived work". Note that the >> "Defining Apache Hadoop" draft document we are voting on doesn't use >> that term. > > See the section titled "Derivative Works". The term "derivative work" > is used throughout the document. I think you're right that the key > point here is not what is and is not a derivative work, but what can > be called Hadoop. > > Seems like the board should have an ASF-wide stance on what can be > called Apache X instead of doing this per-project. > > Thanks, > Eli +
Eric Baldeschwieler 2011-06-17, 00:35
-
RE: [VOTE] Shall we adopt the "Defining Hadoop" pageLawrence Rosen 2011-06-16, 17:27
I'm very confused by this thread. What does trademark law have to do with
derivative work analysis under copyright law? Is there something specific in our FAQ or trademark policy that confuses these concepts and that we should clean up? /Larry Please cc: trademarks@ because I'm not on the other lists. > -----Original Message----- > From: Eli Collins [mailto:[EMAIL PROTECTED]] > Sent: Thursday, June 16, 2011 9:05 AM > To: Matthew Foley > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: Re: [VOTE] Shall we adopt the "Defining Hadoop" page > > On Wed, Jun 15, 2011 at 6:17 PM, Matthew Foley <[EMAIL PROTECTED]> > wrote: > > I tend to agree with what I think you are saying, that > > * applying a small-number-of-patches that are > > * for high-severity-bug-fixes, and > > * have been Apache-Hadoop-committed > > to an Apache Hadoop release should not demote the result to a > "derived work". > > However, if so many patches are applied that the result cannot be > meaningfully > > correlated with a specific Apache Hadoop release, then it probably > has > > become a derived work. > > > > This is one reason why I think the definition of derived work in the > draft of the wiki is way too broad. Something that's nothing like > Hadoop at all but includes a Hadoop jar is given the same label as > something with a single security patch. I think we can come up with a > more useful definition of derived work. If we do that would help us > draw the distinction between: > 1. An Apache Hadoop release voted on the PMC, bit-for-bit identical > 2. An Apache Hadoop release + backports (eg say per the above > definition of backport) > 3. Something that is powered by Hadoop (eg HBase) > 4. Something that is not Hadoop nor powered by Hadoop (eg the way tc > Server is not powered by Apache Tomcat) > > Note that the current document does not make an exception for security > patches. I and Owen made this suggestion on this thread but the > writeup we are voting on makes no such exception. > > > But how do we draw a meaningful line across that big gray area? > That's why I'd like to > > see specific text from one of the other projects you cited as an > example. > > > > Googling didn't turn up anything in their public archives. This was in > an email exchange I had with Shane several years ago. Hopefully their > PMC can chime in. > > Thanks, > Eli +
Lawrence Rosen 2011-06-16, 17:27
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageTed Dunning 2011-06-15, 18:13
+1 to what Eli says. If nobody is running official Hadoop according to this
definition, but everybody thinks that they are running hadoop, then this definition is a bit out of whack. The source of the dissonance is related to the fact that release just don't happen often enough in Hadoop. In addition, I think that the limitations on usage are too strict. For instance, if "QuickBooks for Windows" [1] doesn't cause Microsoft to sue Intuit, then "Joe's Foo for Apache Hadoop" really shouldn't cause any more grief. So I would give a (non-binding) -1 to the policy as stated. [1] http://quickbooks.intuit.com/product/accounting_software/windows_financial_management_software.jsp On Wed, Jun 15, 2011 at 6:40 PM, Eli Collins <[EMAIL PROTECTED]> wrote: > On Tue, Jun 14, 2011 at 7:45 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > > > > On Jun 14, 2011, at 5:48 PM, Eli Collins wrote: > > > >> Wrt derivative works, it's not clear from the document, but I think we > >> should explicitly adopt the policy of HTTPD and Subversion that > >> backported patches from trunk and security fixes are permitted. > > > > Actually, the document is extremely clear that only Apache releases may > be called Hadoop. > > > > There was a very long thread about why the rapidly expanding > Hadoop-ecosystem is leading to at lot of customer confusion about the > different "versions" of Hadoop. We as the Hadoop project don't have the > resources or the necessary compatibility test suite to test compatibility > between the different sets of cherry picked patches. We also don't have time > to ensure that all of the 1,000's of patches applied to 0.20.2 in each of > the many (10? 15?) different versions have been committed to trunk. > Futhermore, under the Apache license, a company Foo could claim that it is a > cherry pick version of Hadoop without releasing their source code that would > enable verification. > > > > In summary, > > 1. Hadoop is very successful. > > 2. There are many different commercial products that are trying to use > the Hadoop name. > > 3. We can't check or enforce that the cherry pick versions are following > the rules. > > 4. We don't have a TCK like Java does to validate new versions are > compatible. > > 5. By far the most fair way to ensure compatibility and fairness between > companies is that only Apache Hadoop releases may be called Hadoop. > > > > That said, a package that includes a small number (< 3) of security > patches that haven't been released yet doesn't seem unreasonable. > > > > I've spoken with ops teams at many companies, I am not aware of > anyone who runs an official release (with just 2 security patches). By > this definition many of the most valuable contributors to Hadoop, > including Yahoo!, Cloudera, Facebook, etc are not using Hadoop. Is > that really the message we want to send? We expect the PMC to enforce > this equally across all parties? > > It's a fact of life that companies and ops teams that support Hadoop > need to patch the software before the PMC has time and/or will to vote > on new releases. This is why HTTP and Subversion allow this. Putting a > build of Hadoop that has 4 security patches applied into the same > category as a product that has entirely re-worked the code and not > gotten it checked into trunk does a major disservice to the people who > contribute to and invest in the project. > > Thanks, > Eli > +
Ted Dunning 2011-06-15, 18:13
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageArun C Murthy 2011-06-15, 18:37
On Jun 15, 2011, at 10:10 PM, Eli Collins wrote: > I've spoken with ops teams at many companies, I am not aware of > anyone who runs an official release (with just 2 security patches). By > this definition many of the most valuable contributors to Hadoop, > including Yahoo!, Cloudera, Facebook, etc are not using Hadoop. Is > that really the message we want to send? We expect the PMC to enforce > this equally across all parties? This is only a recent (less than 2 yrs) phenomenon with hadoop-0.20 onwards. I've been on the project for over 5 years now and I've run official Apache Hadoop releases at Y! for the majority of that time. From hadoop-0.1 to hadoop-0.18. IAC, getting everyone to run an official release isn't an anti-goal. And, as Steve points out, this really doesn't concern internal deployments - public redistributables is something I worry about as a PMC member with my Apache hat on. +1 for the current version (Defining Hadoop (last edited 2011-06-09 02:56:39 by OwenOMalley) thanks, Arun +
Arun C Murthy 2011-06-15, 18:37
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageEli Collins 2011-06-15, 22:25
On Wed, Jun 15, 2011 at 11:37 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> > On Jun 15, 2011, at 10:10 PM, Eli Collins wrote: > >> I've spoken with ops teams at many companies, I am not aware of >> anyone who runs an official release (with just 2 security patches). By >> this definition many of the most valuable contributors to Hadoop, >> including Yahoo!, Cloudera, Facebook, etc are not using Hadoop. Is >> that really the message we want to send? We expect the PMC to enforce >> this equally across all parties? > > This is only a recent (less than 2 yrs) phenomenon with hadoop-0.20 onwards. > > I've been on the project for over 5 years now and I've run official Apache > Hadoop releases at Y! for the majority of that time. From hadoop-0.1 to > hadoop-0.18. But Yahoo! hasn't. According to this wiki YDH (0.20.100) would *not* be considered Apache Hadoop. For example see HADOOP-6962 which refers to 0.20.9, an internal Yahoo! release, not an official Apache release. Are you really comfortable saying Yahoo! doesn't run Hadoop? Thanks, Eli +
Eli Collins 2011-06-15, 22:25
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageChris Douglas 2011-06-15, 22:42
On Wed, Jun 15, 2011 at 3:25 PM, Eli Collins <[EMAIL PROTECTED]> wrote:
> But Yahoo! hasn't. According to this wiki YDH (0.20.100) would *not* > be considered Apache Hadoop. For example see HADOOP-6962 which refers > to 0.20.9, an internal Yahoo! release, not an official Apache release. > Are you really comfortable saying Yahoo! doesn't run Hadoop? This conclusion does not follow. The guidelines prohibit companies from distributing that software as "Hadoop". It takes no position on whether a company that modifies a release of Apache Hadoop for its deployment credits the project. On Wed, Jun 15, 2011 at 11:13 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > In addition, I think that the limitations on usage are too strict. For > instance, if "QuickBooks for Windows" [1] doesn't cause Microsoft to sue > Intuit, then "Joe's Foo for Apache Hadoop" really shouldn't cause any more > grief. This analogy is also inexact. QuickBooks is an application running on Windows, not a replacement for it. -C +
Chris Douglas 2011-06-15, 22:42
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageEli Collins 2011-06-15, 23:11
On Wed, Jun 15, 2011 at 3:42 PM, Chris Douglas <[EMAIL PROTECTED]> wrote:
> On Wed, Jun 15, 2011 at 3:25 PM, Eli Collins <[EMAIL PROTECTED]> wrote: >> But Yahoo! hasn't. According to this wiki YDH (0.20.100) would *not* >> be considered Apache Hadoop. For example see HADOOP-6962 which refers >> to 0.20.9, an internal Yahoo! release, not an official Apache release. >> Are you really comfortable saying Yahoo! doesn't run Hadoop? > > This conclusion does not follow. The guidelines prohibit companies > from distributing that software as "Hadoop". It takes no position on > whether a company that modifies a release of Apache Hadoop for its > deployment credits the project. > This is independent of distribution. The guideline clearly defines such an artifact as a "derivative work" and states that "Products that are derivative works of Apache Hadoop are not Apache Hadoop". Therefore it's false for the company to claim they are using Apache Hadoop. Thanks, Eli +
Eli Collins 2011-06-15, 23:11
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageEli Collins 2011-06-15, 01:15
On Tue, Jun 14, 2011 at 3:56 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
> All, > Steve Loughran has done some great work on defining what can be called Hadoop at http://wiki.apache.org/hadoop/Defining%20Hadoop. After some cleanup from Noirin and Shane, I think we've got a really good base. I'd like a vote to approve the content (at the current revision 12) and put the content on our web site. > > Clearly, I'm +1. > > -- Owen I'd like to make another suggestion. Currently we call two types of things powered by Apache Hadoop: 1. Something that runs on Hadoop (eg HBase or Karmasphere) 2. Something that includes Hadoop artifacts/source code Shouldn't we distinguish between these two, such that the 2nd is not powered by Hadoop? Eg tc server is not powered by Apache Tomcat right? Apologies for having discussion on a vote thread but this is the first time I've seen the current revision and it seems reasonable to have an opportunity to discuss a specific revision before voting on it. Thanks, Eli +
Eli Collins 2011-06-15, 01:15
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageKonstantin Boudnik 2011-06-15, 02:32
On Tue, Jun 14, 2011 at 18:15, Eli Collins <[EMAIL PROTECTED]> wrote:
> On Tue, Jun 14, 2011 at 3:56 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: >> All, >> Steve Loughran has done some great work on defining what can be called Hadoop at http://wiki.apache.org/hadoop/Defining%20Hadoop. After some cleanup from Noirin and Shane, I think we've got a really good base. I'd like a vote to approve the content (at the current revision 12) and put the content on our web site. >> >> Clearly, I'm +1. >> >> -- Owen > > I'd like to make another suggestion. Currently we call two types of > things powered by Apache Hadoop: > > 1. Something that runs on Hadoop (eg HBase or Karmasphere) To be completely precise Karmasphere doesn't 'run on Hadoop'. Their products "integrate with a variety of Hadoop distributions and related technologies..." as you can see here http://karmasphere.com/Miscellaneous/overview.html. Although in case of HBase you right ;) Cos > 2. Something that includes Hadoop artifacts/source code > > Shouldn't we distinguish between these two, such that the 2nd is not > powered by Hadoop? Eg tc server is not powered by Apache Tomcat right? > > Apologies for having discussion on a vote thread but this is the first > time I've seen the current revision and it seems reasonable to have an > opportunity to discuss a specific revision before voting on it. > > Thanks, > Eli > +
Konstantin Boudnik 2011-06-15, 02:32
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageChris Douglas 2011-06-15, 02:16
+1 on revision 12. Thanks for all your work on this, Steve. -C
On Tue, Jun 14, 2011 at 3:56 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > All, > Steve Loughran has done some great work on defining what can be called > Hadoop at http://wiki.apache.org/hadoop/Defining%20Hadoop. After some > cleanup from Noirin and Shane, I think we've got a really good base. I'd > like a vote to approve the content (at the current revision 12) and put the > content on our web site. > Clearly, I'm +1. > -- Owen +
Chris Douglas 2011-06-15, 02:16
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageDoug Cutting 2011-06-16, 08:44
-1 if patches that have been committed to trunk are not permitted to
be applied to distributions that are still called "Apache Hadoop". That's the rule we agreed on some time ago at Roy's suggestion. Let's first document the status quo, then, separately, discuss and vote on changes to it. Also, such a branding rule should probably be uniform across Apache projects, not Hadoop-specific. Doug On Wed, Jun 15, 2011 at 12:56 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > All, > Steve Loughran has done some great work on defining what can be called > Hadoop at http://wiki.apache.org/hadoop/Defining%20Hadoop. After some > cleanup from Noirin and Shane, I think we've got a really good base. I'd > like a vote to approve the content (at the current revision 12) and put the > content on our web site. > Clearly, I'm +1. > -- Owen +
Doug Cutting 2011-06-16, 08:44
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageShane Curcuru 2011-06-18, 14:45
One clarification: I've only had time to review the wiki document for
some terminology updates, and not for the overall content yet. So from the trademarks@ point of view, more review is needed before we work on making this official. From the significant amount of discussion in this vote thread, I think it might be good to have the Hadoop PMC and trademarks@ work on getting a more organized consensus first, before voting on an updated proposed Hadoop policy. - Shane Owen O'Malley wrote: > All, > Steve Loughran has done some great work on defining what can be > called Hadoop at http://wiki.apache.org/hadoop/Defining%20Hadoop > <http://wiki.apache.org/hadoop/Defining Hadoop>. After some cleanup from > Noirin and Shane, I think we've got a really good base. I'd like a vote > to approve the content (at the current revision 12) and put the content > on our web site. > > Clearly, I'm +1. > > -- Owen +
Shane Curcuru 2011-06-18, 14:45
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageOwen O'Malley 2011-06-24, 08:26
On Jun 14, 2011, at 3:56 PM, Owen O'Malley wrote: > All, > Steve Loughran has done some great work on defining what can be called Hadoop at http://wiki.apache.org/hadoop/Defining%20Hadoop. After some cleanup from Noirin and Shane, I think we've got a really good base. I'd like a vote to approve the content (at the current revision 12) and put the content on our web site. Binding +1: Arun, Chris, Ian, Owen Binding -1: Doug, Eli, Todd Non-binding +1: Allen, Cos, Matt, Steve Non-binding -1: Ted, Jeff Well, technically this passed, but we've been encouraged to discuss it more. Personally, I'd love to get more feedback from Larry about how we should accomplish the goal of getting packagers to either use Apache Hadoop releases or only use "powered by Hadoop." Clearly there is a significant difference of opinion about the value of that goal that is unlikely to be resolved by debate. Having a clearly stated trademark statement on the website will help significantly with contacting organizations that are mis-using the trademark, so I don't want to postpone this too long. Let's discuss it for a week and then call a new vote if we think that is merited. -- Owen +
Owen O'Malley 2011-06-24, 08:26
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageDoug Cutting 2011-06-24, 13:43
On 06/24/2011 10:26 AM, Owen O'Malley wrote:
> Having a clearly stated trademark statement on the website will help > significantly with contacting organizations that are mis-using the > trademark, so I don't want to postpone this too long. Let's discuss > it for a week and then call a new vote if we think that is merited. Might it be better to improve the existing Apache trademark policy page? http://www.apache.org/foundation/marks/ This way all projects can benefit, e.g., Pig, Hive, Zookeeper, etc. We might, e.g., propose more examples of acceptable and not-acceptable uses of Apache marks there, etc. We can work with trademarks@ to build a library of boilerplate letters to be sent to folks whose use Apache marks in objectionable ways. Doug +
Doug Cutting 2011-06-24, 13:43
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageOwen O'Malley 2011-06-24, 17:07
On Jun 24, 2011, at 6:43 AM, Doug Cutting wrote: > Might it be better to improve the existing Apache trademark policy page? When the project is having trouble agreeing, reaching agreement at the foundation level seems unrealistic. Let's reach a workable solution for Hadoop, see how it functions in practice, iterate and improve, and then we can consider pushing it to the entire foundation. -- Owen +
Owen O'Malley 2011-06-24, 17:07
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageDoug Cutting 2011-06-25, 05:08
On 06/24/2011 07:07 PM, Owen O'Malley wrote:
> On Jun 24, 2011, at 6:43 AM, Doug Cutting wrote: > >> Might it be better to improve the existing Apache trademark policy >> page? > > When the project is having trouble agreeing, reaching agreement at > the foundation level seems unrealistic. ASF trademark policy is set by Shane, VP Trademark, not by a committee. Doug +
Doug Cutting 2011-06-25, 05:08
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageRoy T. Fielding 2011-06-26, 19:11
On Jun 24, 2011, at 10:08 PM, Doug Cutting wrote:
> On 06/24/2011 07:07 PM, Owen O'Malley wrote: >> On Jun 24, 2011, at 6:43 AM, Doug Cutting wrote: >> >>> Might it be better to improve the existing Apache trademark policy >>> page? >> >> When the project is having trouble agreeing, reaching agreement at >> the foundation level seems unrealistic. > > ASF trademark policy is set by Shane, VP Trademark, not by a committee. If we apply trademark policy to this discussion, then the only answer possible is that only releases made by the Apache Hadoop PMC can be called Hadoop. That is, after all, the essence of board delegation to PMCs and the meaning of trademarks. Traditionally, we have also allowed distributions that apply released security patches, for example as found in http://www.apache.org/dist/httpd/patches/ and turned a blind eye toward changes that are purely to port to a new platform. I did not write those exceptions down because I don't know what (if any) impact they might have on enforcement. I said before that we typically don't argue about distributions that include revisions that are on a release branch, but that assumed the project is actually working toward a release of that branch. I have a hard time believing that Hadoop's trunk is a release branch. In any case, this very specific exception should be entirely decided by the project -- the VP of Trademarks has no role in deciding what is the purview of each PMC, namely the decision on what is or is not released in the name of that project. ....Roy +
Roy T. Fielding 2011-06-26, 19:11
-
Re: [VOTE] Shall we adopt the "Defining Hadoop" pageEric Baldeschwieler 2011-06-22, 15:41
I agree with this.
We need to find a middle ground that achieves three aims: 1) Makes it clear that an ASF release of Hadoop is THE APACHE HADOOP. Jeff's manpower argument actually reinforces this. We need a very testable definition of what is an Apache Hadoop Release or enforcement will be impossible because each test of the policy might require a visit to the supreme court. It's MD5 matches the MD5 of an apache release is a clear definition. 2) We need a proposal for derived products that vendors feel are branding friendly. These should be clear enough that users understand the difference between a product that packages Apache Hadoop (MD5 test), one that is completely open source under the Apache license (easy to test) and one that simply uses some subset of the code under a more restrictive license or closed source. 3) Compatibility: I think it would be great to harness all this energy around compatibility to start a compatibility suite inside the Apache Hadoop project. Then we could define compatible with Apache Hadoop in a clear way controlled by the Apache Hadoop PMC. With luck vendors on both sides of the debate will be incentivized to contribute to the project this way. Such a suite would also prove useful to the developers of Apache Hadoop. E14 On Jun 20, 2011, at 10:09 AM, Ted Dunning wrote: > Great summary Andrew. > > I would add one more precipitating factor here. That is the arrival of a > number of products which are very close to the Apache version of Hadoop but > for which there is no good and widely accepted terminology that gives proper > credit to their lineage while making clear the distinction from bit-for-bit > copies of official Apache releases. > > Some products are analogous to hive, pig or hbase in that they are > independent systems that run ON hadoop (or close equivalents). These have > no terminology problem because these products aren't hadoop, but rather use > hadoop. > > Other products contain Hadoop internally as a critical component but do not > necessarily expose Hadoop capabilities to the end user (I can't name these > products, but they exist). These products have little nomenclatural > difficulty because the powerd-by-Hadoop description fits very well. > > The products with the terminology problem are the ones that are add either > curation and packaging (Cloudera) or substantial additional performance > enhancing components (MapR). These products are upwardly compatible with > Apache Hadoop in that programs that run on Hadoop will very probably run on > these Hadoop-like systems. The problem is that there is no good term for > these products. They may even contain components that are bit-for-bit > identical to the same components for Apache releases. It is fair to say > that these are not Apache released software, but it is also fair to say that > there ought to be a better name for the class of these products. > > On Mon, Jun 20, 2011 at 4:39 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > >> Hadoop I think needs to be more careful. What triggered this discussion is >> the arrival of new players releasing products they call Hadoop but >> containing severe changes the community, by way of the ASF umbrella we all >> work under, had nothing to do with designing or developing. And some of >> these are being open sourced as a Hadoop. There is no Linus here. Which of >> these is _the_ Hadoop? As a would-be contributor, which should I select? >> +
Eric Baldeschwieler 2011-06-22, 15:41
|