|
Arun C Murthy
2010-08-24, 00:27
Tom White
2010-08-25, 17:16
Hemanth Yamijala
2010-08-25, 17:46
Arun C Murthy
2010-08-25, 17:59
Allen Wittenauer
2010-08-25, 20:25
Devaraj Das
2010-08-25, 21:11
Steve Loughran
2010-08-26, 14:11
Arun C Murthy
2010-08-26, 16:09
Steve Loughran
2010-08-26, 16:40
Stack
2010-08-26, 19:08
Arun C Murthy
2010-08-26, 23:22
Owen O'Malley
2010-08-26, 23:28
Ted Yu
2010-08-26, 23:30
Arun C Murthy
2010-08-26, 23:41
Arun C Murthy
2010-08-30, 21:14
Doug Cutting
2010-10-15, 21:27
Arun C Murthy
2011-01-11, 07:11
Stack
2011-01-11, 19:13
Arun C Murthy
2011-01-12, 05:09
Mahadev Konar
2011-01-12, 17:08
Patrick Angeles
2011-01-12, 17:27
Owen O'Malley
2011-01-12, 21:10
Ian Holsman
2011-01-12, 21:26
Arun C Murthy
2011-01-12, 21:34
Eric Baldeschwieler
2011-01-12, 21:53
Chris Douglas
2011-01-12, 21:57
Nigel Daley
2011-01-12, 22:56
Eli Collins
2011-01-12, 23:02
Nigel Daley
2011-01-12, 23:14
Ian Holsman
2011-01-12, 23:26
Arun C Murthy
2011-01-13, 07:07
Nigel Daley
2011-01-13, 07:09
Todd Lipcon
2011-01-13, 22:04
Arun C Murthy
2011-01-13, 23:05
Todd Lipcon
2011-01-13, 23:34
Arun C Murthy
2011-01-14, 00:58
Eli Collins
2011-01-14, 01:35
Arun C Murthy
2011-01-14, 02:12
Eli Collins
2011-01-14, 02:50
Arun C Murthy
2011-01-14, 04:23
Tsz Wo \
2011-01-14, 04:58
Eric Baldeschwieler
2011-01-14, 05:11
Nigel Daley
2011-01-14, 06:07
Arun C Murthy
2011-01-14, 06:21
Eli Collins
2011-01-14, 06:29
Todd Lipcon
2011-01-14, 06:36
Arun C Murthy
2011-01-14, 06:37
Stack
2011-01-14, 06:59
Eric Baldeschwieler
2011-01-14, 07:16
Arun C Murthy
2011-01-14, 07:21
Ian Holsman
2011-01-14, 14:14
Konstantin Boudnik
2011-01-14, 17:20
Nigel Daley
2011-01-14, 17:32
Ian Holsman
2011-01-14, 18:00
Jakob Homan
2011-01-14, 18:03
Eric Baldeschwieler
2011-01-14, 18:25
Eric Baldeschwieler
2011-01-14, 18:30
Dhruba Borthakur
2011-01-14, 19:24
Milind Bhandarkar
2011-01-14, 19:59
Stack
2011-01-16, 22:57
Nigel Daley
2011-01-17, 19:58
Doug Cutting
2011-01-17, 20:11
Nigel Daley
2011-01-17, 20:21
Eric Baldeschwieler
2011-01-17, 22:56
Eric Baldeschwieler
2011-01-17, 23:29
Doug Cutting
2011-01-17, 23:49
Chris Douglas
2011-01-18, 01:41
Todd Papaioannou
2011-01-18, 02:55
Jeff Hammerbacher
2011-01-18, 04:30
Arun C Murthy
2011-01-18, 04:32
Jeff Hammerbacher
2011-01-18, 04:40
Chris Douglas
2011-01-18, 05:36
Arun C Murthy
2011-01-18, 05:56
Arun C Murthy
2011-01-18, 05:57
Roy T. Fielding
2011-01-18, 08:20
Arun C Murthy
2011-01-18, 09:59
Severance, Steve
2011-01-18, 18:53
Allen Wittenauer
2011-01-18, 19:10
Eric Baldeschwieler
2011-01-18, 23:22
Ian Holsman
2011-01-19, 07:49
Scott Carey
2011-01-19, 18:09
Konstantin Shvachko
2011-01-19, 18:12
Arun C Murthy
2011-01-22, 07:26
Ian Holsman
2011-01-22, 14:22
Eric Baldeschwieler
2011-01-25, 10:05
Ian Holsman
2011-01-25, 14:51
Arun C Murthy
2011-01-27, 04:11
|
-
[DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2010-08-24, 00:27
Even with the work on hadoop-0.22 (trunk) starting in earnest it is
fairly obvious, given our past history, that it will take a while for us to get it stable and deployable - for e.g. it took us nearly 6 months to deploy hadoop-0.20. In the interim I'd like to propose we push a hadoop-0.20-security release off the Yahoo! patchset (http://github.com/yahoo/hadoop- common). This will ensure the community benefits from all the work done at Yahoo! for over 12 months *now*, and ensures that we do not have to wait until hadoop-0.22 which has all of these patches. Some salient aspects: a) Full-fledged security implementation deployed at scale (4000 nodes) in production. b) Lots of work on the stabilizing and optimizing the NameNode and JobTracker for over 12 months. This has been critical in deploying Hadoop at scale i.e. clusters of 4000 nodes. For e.g. we have a 50% improvement in CPU utilization on the JobTracker vis-a-vis the hadoop-0.20.2 release. c) Several new features in the scheduler (CapacityScheduler), Map- Reduce framework, better support for multi-tenancy etc. d) Several performance and stability improvements to the system e.g. iterative ls, robustness against rogue clients/jobs/users etc. Also, given the huge number of features and enhancements I'd like to propose we create a new 0.20-security branch and commit the Yahoo patchset there for the release. This has been proposed earlier by Doug and did not get far due to concerns about the effect this would have on development on trunk. However, I believe, we have a case for demonstrable progress on trunk now, and it would be useful to have an interim, fully-tested Apache Hadoop release available to the community. Conceivably, one could imagine a Hadoop Security + Append release soon after. At this point a Hadoop Security release alone would add tremendous value for the reasons above. Presently we would like to get this release out quickly to focus the majority of our efforts on trunk. Thoughts? Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetTom White 2010-08-25, 17:16
Hi Arun,
I think it would be good to have a shared 0.20 Apache security branch. Since security isn't in 0.21, and the 0.22 release is a some way off as you mention, this would be useful for folks who want the security features sooner (and want to use an Apache release). Thanks, Tom On Mon, Aug 23, 2010 at 5:27 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > Even with the work on hadoop-0.22 (trunk) starting in earnest it is fairly > obvious, given our past history, that it will take a while for us to get it > stable and deployable - for e.g. it took us nearly 6 months to deploy > hadoop-0.20. > > In the interim I'd like to propose we push a hadoop-0.20-security release > off the Yahoo! patchset (http://github.com/yahoo/hadoop-common). This will > ensure the community benefits from all the work done at Yahoo! for over 12 > months *now*, and ensures that we do not have to wait until hadoop-0.22 > which has all of these patches. > > Some salient aspects: > a) Full-fledged security implementation deployed at scale (4000 nodes) in > production. > b) Lots of work on the stabilizing and optimizing the NameNode and > JobTracker for over 12 months. This has been critical in deploying Hadoop at > scale i.e. clusters of 4000 nodes. For e.g. we have a 50% improvement in CPU > utilization on the JobTracker vis-a-vis the hadoop-0.20.2 release. > c) Several new features in the scheduler (CapacityScheduler), Map-Reduce > framework, better support for multi-tenancy etc. > d) Several performance and stability improvements to the system e.g. > iterative ls, robustness against rogue clients/jobs/users etc. > > Also, given the huge number of features and enhancements I'd like to propose > we create a new 0.20-security branch and commit the Yahoo patchset there for > the release. > > This has been proposed earlier by Doug and did not get far due to concerns > about the effect this would have on development on trunk. However, I > believe, we have a case for demonstrable progress on trunk now, and it would > be useful to have an interim, fully-tested Apache Hadoop release available > to the community. > > Conceivably, one could imagine a Hadoop Security + Append release soon > after. At this point a Hadoop Security release alone would add tremendous > value for the reasons above. Presently we would like to get this release out > quickly to focus the majority of our efforts on trunk. > > Thoughts? > > Arun > >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetHemanth Yamijala 2010-08-25, 17:46
Arun,
How much time do you think it would take to have a version of 0.20 with the security features in it ready ? In a different thread, Owen has started discussing plans around 0.22. Do you think this effort would affect 0.22 release ? I do agree that this would be very useful for folks who want security sooner. And the fact that Yahoo! have been running it at scale for a good while now is also assuring. Thanks hemanth On Tue, Aug 24, 2010 at 5:57 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > Even with the work on hadoop-0.22 (trunk) starting in earnest it is fairly > obvious, given our past history, that it will take a while for us to get it > stable and deployable - for e.g. it took us nearly 6 months to deploy > hadoop-0.20. > > In the interim I'd like to propose we push a hadoop-0.20-security release > off the Yahoo! patchset (http://github.com/yahoo/hadoop-common). This will > ensure the community benefits from all the work done at Yahoo! for over 12 > months *now*, and ensures that we do not have to wait until hadoop-0.22 > which has all of these patches. > > Some salient aspects: > a) Full-fledged security implementation deployed at scale (4000 nodes) in > production. > b) Lots of work on the stabilizing and optimizing the NameNode and > JobTracker for over 12 months. This has been critical in deploying Hadoop at > scale i.e. clusters of 4000 nodes. For e.g. we have a 50% improvement in CPU > utilization on the JobTracker vis-a-vis the hadoop-0.20.2 release. > c) Several new features in the scheduler (CapacityScheduler), Map-Reduce > framework, better support for multi-tenancy etc. > d) Several performance and stability improvements to the system e.g. > iterative ls, robustness against rogue clients/jobs/users etc. > > Also, given the huge number of features and enhancements I'd like to propose > we create a new 0.20-security branch and commit the Yahoo patchset there for > the release. > > This has been proposed earlier by Doug and did not get far due to concerns > about the effect this would have on development on trunk. However, I > believe, we have a case for demonstrable progress on trunk now, and it would > be useful to have an interim, fully-tested Apache Hadoop release available > to the community. > > Conceivably, one could imagine a Hadoop Security + Append release soon > after. At this point a Hadoop Security release alone would add tremendous > value for the reasons above. Presently we would like to get this release out > quickly to focus the majority of our efforts on trunk. > > Thoughts? > > Arun > >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2010-08-25, 17:59
On Aug 25, 2010, at 10:46 AM, Hemanth Yamijala wrote:
> Arun, > > How much time do you think it would take to have a version of 0.20 > with the security features in it ready ? In a different thread, Owen > has started discussing plans around 0.22. Do you think this effort > would affect 0.22 release ? > I think it should be fairly trivial to get this release out - most of the effort is just the mechanics of committing the patches to an Apache branch from the yahoo git repository, creating a release candidate and calling it a success! *smile* I think doing this quickly is critical in ensuring that we do not lose focus on 0.22, but I believe this will definitely help the community. > I do agree that this would be very useful for folks who want security > sooner. And the fact that Yahoo! have been running it at scale for a > good while now is also assuring. Just to clarify - this has security and a bunch of other enhancements (which are either in 0.21 or 0.22 or both). Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetAllen Wittenauer 2010-08-25, 20:25
On Aug 25, 2010, at 10:46 AM, Hemanth Yamijala wrote: > I do agree that this would be very useful for folks who want security > sooner. And the fact that Yahoo! have been running it at scale for a > good while now is also assuring. As has been mentioned a few times, part of the security features are dependent upon Yahoo!-type operations. Those would need to get replaced or a decision would need to be made that we are removing/regressing certain features (the cluster-wide start scripts).
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetDevaraj Das 2010-08-25, 21:11
>As has been mentioned a few times, part of the security features are dependent upon Yahoo!-type operations.
Allen, could you please enlist them here again (for the benefit of the community)? Or, are you referring to only the cluster-wide start scripts? On 8/25/10 1:25 PM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote: On Aug 25, 2010, at 10:46 AM, Hemanth Yamijala wrote: > I do agree that this would be very useful for folks who want security > sooner. And the fact that Yahoo! have been running it at scale for a > good while now is also assuring. As has been mentioned a few times, part of the security features are dependent upon Yahoo!-type operations. Those would need to get replaced or a decision would need to be made that we are removing/regressing certain features (the cluster-wide start scripts).
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetSteve Loughran 2010-08-26, 14:11
On 25/08/10 18:59, Arun C Murthy wrote:
> On Aug 25, 2010, at 10:46 AM, Hemanth Yamijala wrote: > >> Arun, >> >> How much time do you think it would take to have a version of 0.20 >> with the security features in it ready ? In a different thread, Owen >> has started discussing plans around 0.22. Do you think this effort >> would affect 0.22 release ? >> > > I think it should be fairly trivial to get this release out - most of > the effort is just the mechanics of committing the patches to an Apache > branch from the yahoo git repository, creating a release candidate and > calling it a success! *smile* oh, and testing it.. what scalability patches like HDFS-599 are in?
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2010-08-26, 16:09
On Aug 26, 2010, at 7:11 AM, Steve Loughran wrote: > On 25/08/10 18:59, Arun C Murthy wrote: >> On Aug 25, 2010, at 10:46 AM, Hemanth Yamijala wrote: >> >>> Arun, >>> >>> How much time do you think it would take to have a version of 0.20 >>> with the security features in it ready ? In a different thread, Owen >>> has started discussing plans around 0.22. Do you think this effort >>> would affect 0.22 release ? >>> >> >> I think it should be fairly trivial to get this release out - most of >> the effort is just the mechanics of committing the patches to an >> Apache >> branch from the yahoo git repository, creating a release candidate >> and >> calling it a success! *smile* > > oh, and testing it.. > Already is! *smile* It's running on 4k clusters in production at this point...
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetSteve Loughran 2010-08-26, 16:40
On 26/08/10 17:09, Arun C Murthy wrote:
> > On Aug 26, 2010, at 7:11 AM, Steve Loughran wrote: > >> On 25/08/10 18:59, Arun C Murthy wrote: >>> On Aug 25, 2010, at 10:46 AM, Hemanth Yamijala wrote: >>> >>>> Arun, >>>> >>>> How much time do you think it would take to have a version of 0.20 >>>> with the security features in it ready ? In a different thread, Owen >>>> has started discussing plans around 0.22. Do you think this effort >>>> would affect 0.22 release ? >>>> >>> >>> I think it should be fairly trivial to get this release out - most of >>> the effort is just the mechanics of committing the patches to an Apache >>> branch from the yahoo git repository, creating a release candidate and >>> calling it a success! *smile* >> >> oh, and testing it.. >> > > Already is! *smile* > It's running on 4k clusters in production at this point... > +1 then, ship it.
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetStack 2010-08-26, 19:08
On Mon, Aug 23, 2010 at 5:27 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> In the interim I'd like to propose we push a hadoop-0.20-security release > off the Yahoo! patchset (http://github.com/yahoo/hadoop-common). This will > ensure the community benefits from all the work done at Yahoo! for over 12 > months *now*, and ensures that we do not have to wait until hadoop-0.22 > which has all of these patches. > Sounds good to me. What will this release be called? hadoop-0.20.3-security? > Conceivably, one could imagine a Hadoop Security + Append release soon > after. Well, it'd probably be better if we just did an append release first? A good few of us have been banging on the 0.20-append branch w/ a while now and its for sure doing append better than 0.20 did (smile). St.Ack
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2010-08-26, 23:22
On Aug 26, 2010, at 12:08 PM, Stack wrote: > On Mon, Aug 23, 2010 at 5:27 PM, Arun C Murthy <[EMAIL PROTECTED]> > wrote: >> In the interim I'd like to propose we push a hadoop-0.20-security >> release >> off the Yahoo! patchset (http://github.com/yahoo/hadoop-common). >> This will >> ensure the community benefits from all the work done at Yahoo! for >> over 12 >> months *now*, and ensures that we do not have to wait until >> hadoop-0.22 >> which has all of these patches. >> > > Sounds good to me. What will this release be called? hadoop-0.20.3- > security? hadoop-0.20-security. I want to ensure hadoop-0.20 be a separate line, so as to not confuse people. > >> Conceivably, one could imagine a Hadoop Security + Append release >> soon >> after. > > Well, it'd probably be better if we just did an append release first? > A good few of us have been banging on the 0.20-append branch w/ a > while now and its for sure doing append better than 0.20 did (smile). I think these are orthogonal and both can run their own course. Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetOwen O'Malley 2010-08-26, 23:28
On Thu, Aug 26, 2010 at 12:08 PM, Stack <[EMAIL PROTECTED]> wrote:
> Sounds good to me. What will this release be called? hadoop-0.20.3-security? It is a new branch, so the question is what is the branch name. I'd propose calling it 0.20-security and the releases would be 0.20-security.0, etc. > Well, it'd probably be better if we just did an append release first? I don't think the ordering maters. 0.20-security is a different branch that isn't comparable to 0.20-append. 0.20 < 0.20-security < 0.22 0.20 < 0.20-append < 0.21 < 0.22 It does make a bit of a mess. -- Owen
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetTed Yu 2010-08-26, 23:30
This would imply hadoop-0.20-security-append or hadoop-0.20-append-security
release be created which contains security and append features. On Thu, Aug 26, 2010 at 4:22 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > > On Aug 26, 2010, at 12:08 PM, Stack wrote: > > On Mon, Aug 23, 2010 at 5:27 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >> >>> In the interim I'd like to propose we push a hadoop-0.20-security release >>> off the Yahoo! patchset (http://github.com/yahoo/hadoop-common). This >>> will >>> ensure the community benefits from all the work done at Yahoo! for over >>> 12 >>> months *now*, and ensures that we do not have to wait until hadoop-0.22 >>> which has all of these patches. >>> >>> >> Sounds good to me. What will this release be called? >> hadoop-0.20.3-security? >> > > hadoop-0.20-security. I want to ensure hadoop-0.20 be a separate line, so > as to not confuse people. > > > >> Conceivably, one could imagine a Hadoop Security + Append release soon >>> after. >>> >> >> Well, it'd probably be better if we just did an append release first? >> A good few of us have been banging on the 0.20-append branch w/ a >> while now and its for sure doing append better than 0.20 did (smile). >> > > I think these are orthogonal and both can run their own course. > > Arun >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2010-08-26, 23:41
On Aug 26, 2010, at 4:30 PM, Ted Yu wrote: > This would imply hadoop-0.20-security-append or hadoop-0.20-append- > security > release be created which contains security and append features. As I mentioned in my initial proposal - it's conceivable, not imminent. The community might decide that it is a valuable direction and folks may work on integrating the two. At this point, I am signing up to shepherd hadoop-0.20-security. I'd like to do it quickly and move on to working on Hadoop trunk, others are welcome to take this and run further. Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2010-08-30, 21:14
On Aug 23, 2010, at 5:27 PM, Arun C Murthy wrote: > In the interim I'd like to propose we push a hadoop-0.20-security > release off the Yahoo! patchset (http://github.com/yahoo/hadoop- > common). This will ensure the community benefits from all the work > done at Yahoo! for over 12 months *now*, and ensures that we do not > have to wait until hadoop-0.22 which has all of these patches. Since most people seemed to think of it as a reasonable idea, I'm going to create the hadoop-0.20-security branch and start the necessary work. thanks, Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetDoug Cutting 2010-10-15, 21:27
On 08/30/2010 02:14 PM, Arun C Murthy wrote:
> Since most people seemed to think of it as a reasonable idea, I'm going > to create the hadoop-0.20-security branch and start the necessary work. I don't yet see this branch. Are you still intending to do this? Doug
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-11, 07:11
On 10/15/2010 02:28 PM, Doug Cutting wrote:
> On 08/30/2010 02:14 PM, Arun C Murthy wrote: > Since most people seemed to think of it as a reasonable idea, I'm > going > to create the hadoop-0.20-security branch and start the necessary > work. > > I don't yet see this branch. Are you still intending to do this? > > Doug Things stalled, my apologies. Turns out having a kid is a lot of work, who knew! *smile* I'm back now and plan to start work on this. Hopefully I can get this over with quickly, in a couple of weeks, to focus on the next release(s). thanks, Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetStack 2011-01-11, 19:13
On Mon, Jan 10, 2011 at 11:11 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> Things stalled, my apologies. Turns out having a kid is a lot of work, who > knew! *smile* > Really (smile -- congrats Arun). > I'm back now and plan to start work on this. Hopefully I can get this over > with quickly, in a couple of weeks, to focus on the next release(s). > What you thinking? What'll you call it? Good on you, St.Ack
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-12, 05:09
On Jan 11, 2011, at 11:14 AM, "Stack" <[EMAIL PROTECTED]> wrote:
>> I'm back now and plan to start work on this. Hopefully I can get this over >> with quickly, in a couple of weeks, to focus on the next release(s). >> > > What you thinking? What'll you call it? > > Good on you, > St.Ack Thanks Stack. I'm open to suggestions - how about something like 20.100 to show that it's a big jump? Anything else? Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetMahadev Konar 2011-01-12, 17:08
+1. I like the idea of 20.100.
Thanks mahadev On 1/11/11 9:09 PM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote: > On Jan 11, 2011, at 11:14 AM, "Stack" <[EMAIL PROTECTED]> wrote: > >>> I'm back now and plan to start work on this. Hopefully I can get this over >>> with quickly, in a couple of weeks, to focus on the next release(s). >>> >> >> What you thinking? What'll you call it? >> >> Good on you, >> St.Ack > > Thanks Stack. > > I'm open to suggestions - how about something like 20.100 to show that it's a > big jump? Anything else? > > Arun >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetPatrick Angeles 2011-01-12, 17:27
You're gonna call your kid 20.100?
:) Congratz On Wed, Jan 12, 2011 at 12:09 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > On Jan 11, 2011, at 11:14 AM, "Stack" <[EMAIL PROTECTED]> wrote: > > >> I'm back now and plan to start work on this. Hopefully I can get this > over > >> with quickly, in a couple of weeks, to focus on the next release(s). > >> > > > > What you thinking? What'll you call it? > > > > Good on you, > > St.Ack > > Thanks Stack. > > I'm open to suggestions - how about something like 20.100 to show that it's > a big jump? Anything else? > > Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetOwen O'Malley 2011-01-12, 21:10
On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote: > I'm open to suggestions - how about something like 20.100 to show > that it's a big jump? Anything else? Although I'm not wild about any of the potential release names, this patch set is neither a subset or superset of the 0.21 or 0.22 branches. Given that, I think that a new major release number makes the most sense. It is also relatively likely that additional minor releases will be made off of this branch while 0.22 is stabilizing. We've talked about declaring 0.20 a 1.0 for a long time and this feels like backing into the decision, but technically, I believe it to be the right name for such a release. Thoughts? -- Owen
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetIan Holsman 2011-01-12, 21:26
so if 0.20 becomes 1.0, what does 0.22 become ?
I'm still not sure if we shouldn't just add security to 0.22, and leave the 0.20 in maintenance mode from here on. On Jan 12, 2011, at 4:10 PM, Owen O'Malley wrote: > > On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote: > >> I'm open to suggestions - how about something like 20.100 to show that it's a big jump? Anything else? > > > Although I'm not wild about any of the potential release names, this patch set is neither a subset or superset of the 0.21 or 0.22 branches. Given that, I think that a new major release number makes the most sense. It is also relatively likely that additional minor releases will be made off of this branch while 0.22 is stabilizing. We've talked about declaring 0.20 a 1.0 for a long time and this feels like backing into the decision, but technically, I believe it to be the right name for such a release. > > Thoughts? > > -- Owen
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-12, 21:34
I'm willing to discuss any and all options, for a very short period.
Technically you have a reasonable point, Doug has suggested this in the past too. If everyone agrees, fine; if not, I'm do not want hung up on a release number. I just *do not* want a controversy. As I mentioned, I'm looking to finish this up in a couple of weeks; so, I could do without a long discussion on the on the critical path. I'm happy to go with a reasonable compromise, if not, hadoop-0.20.100 is what I'm priming for. Heck, if Stack wants to call the append release (not sure how far ahead he is) as hadoop-0.20.100, I'm willing to call this hadoop-0.20.200. All I care about is having a distinct release number from 0.20.2 (our last stable release). Again, I just want to get a release into the hands of our users. Please, let's resolve this quickly. Please. Arun On Jan 12, 2011, at 1:10 PM, Owen O'Malley wrote: > > On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote: > >> I'm open to suggestions - how about something like 20.100 to show >> that it's a big jump? Anything else? > > > Although I'm not wild about any of the potential release names, this > patch set is neither a subset or superset of the 0.21 or 0.22 > branches. Given that, I think that a new major release number makes > the most sense. It is also relatively likely that additional minor > releases will be made off of this branch while 0.22 is stabilizing. > We've talked about declaring 0.20 a 1.0 for a long time and this feels > like backing into the decision, but technically, I believe it to be > the right name for such a release. > > Thoughts? > > -- Owen
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetEric Baldeschwieler 2011-01-12, 21:53
Let me second arun here.
This is incremental work on 0.20. We're happy to support any branch naming strategy the community likes, but sticking with 20.<minor> seems like the right default approach. Let's discuss 1.0 issues on another thread. Our priority is to get our work into other folks hands. Thanks! E14 On Jan 12, 2011, at 1:34 PM, Arun C Murthy wrote: > I'm willing to discuss any and all options, for a very short period. > > Technically you have a reasonable point, Doug has suggested this in > the past too. If everyone agrees, fine; if not, I'm do not want hung > up on a release number. I just *do not* want a controversy. > > As I mentioned, I'm looking to finish this up in a couple of weeks; > so, I could do without a long discussion on the on the critical path. > > I'm happy to go with a reasonable compromise, if not, hadoop-0.20.100 > is what I'm priming for. > > Heck, if Stack wants to call the append release (not sure how far > ahead he is) as hadoop-0.20.100, I'm willing to call this > hadoop-0.20.200. > > All I care about is having a distinct release number from 0.20.2 (our > last stable release). Again, I just want to get a release into the > hands of our users. Please, let's resolve this quickly. Please. > > Arun > > On Jan 12, 2011, at 1:10 PM, Owen O'Malley wrote: > >> >> On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote: >> >>> I'm open to suggestions - how about something like 20.100 to show >>> that it's a big jump? Anything else? >> >> >> Although I'm not wild about any of the potential release names, this >> patch set is neither a subset or superset of the 0.21 or 0.22 >> branches. Given that, I think that a new major release number makes >> the most sense. It is also relatively likely that additional minor >> releases will be made off of this branch while 0.22 is stabilizing. >> We've talked about declaring 0.20 a 1.0 for a long time and this feels >> like backing into the decision, but technically, I believe it to be >> the right name for such a release. >> >> Thoughts? >> >> -- Owen >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetChris Douglas 2011-01-12, 21:57
I had exactly the same reaction when this came up in the past:
http://s.apache.org/l9 http://s.apache.org/5Gv but our experience with myriad 0.20 variants has demonstrated that Hadoop can support both a stable branch and a development branch. Trying to direct effort away from 0.20 by preventing it from happening in Apache didn't work, and I was wrong to advocate for it. The interest in a more slow-moving, stable version of Hadoop will exist whether we give it an outlet in Apache or not, most of us work on both anyway, so we might as well collaborate in both fora. -C On Wed, Jan 12, 2011 at 1:26 PM, Ian Holsman <[EMAIL PROTECTED]> wrote: > so if 0.20 becomes 1.0, what does 0.22 become ? > > I'm still not sure if we shouldn't just add security to 0.22, and leave the 0.20 in maintenance mode from here on. > > On Jan 12, 2011, at 4:10 PM, Owen O'Malley wrote: > >> >> On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote: >> >>> I'm open to suggestions - how about something like 20.100 to show that it's a big jump? Anything else? >> >> >> Although I'm not wild about any of the potential release names, this patch set is neither a subset or superset of the 0.21 or 0.22 branches. Given that, I think that a new major release number makes the most sense. It is also relatively likely that additional minor releases will be made off of this branch while 0.22 is stabilizing. We've talked about declaring 0.20 a 1.0 for a long time and this feels like backing into the decision, but technically, I believe it to be the right name for such a release. >> >> Thoughts? >> >> -- Owen > >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetNigel Daley 2011-01-12, 22:56
+1 for 0.20.x, where x >= 100. I agree that the 1.0 moniker would involve more discussion.
Will this be a jumbo patch attached to a Jira and then committed to the branch? Just curious. Cheers, Nige On Jan 12, 2011, at 1:34 PM, Arun C Murthy wrote: > I'm willing to discuss any and all options, for a very short period. > > Technically you have a reasonable point, Doug has suggested this in the past too. If everyone agrees, fine; if not, I'm do not want hung up on a release number. I just *do not* want a controversy. > > As I mentioned, I'm looking to finish this up in a couple of weeks; so, I could do without a long discussion on the on the critical path. > > I'm happy to go with a reasonable compromise, if not, hadoop-0.20.100 is what I'm priming for. > > Heck, if Stack wants to call the append release (not sure how far ahead he is) as hadoop-0.20.100, I'm willing to call this hadoop-0.20.200. > > All I care about is having a distinct release number from 0.20.2 (our last stable release). Again, I just want to get a release into the hands of our users. Please, let's resolve this quickly. Please. > > Arun > > On Jan 12, 2011, at 1:10 PM, Owen O'Malley wrote: > >> >> On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote: >> >>> I'm open to suggestions - how about something like 20.100 to show >>> that it's a big jump? Anything else? >> >> >> Although I'm not wild about any of the potential release names, this >> patch set is neither a subset or superset of the 0.21 or 0.22 >> branches. Given that, I think that a new major release number makes >> the most sense. It is also relatively likely that additional minor >> releases will be made off of this branch while 0.22 is stabilizing. >> We've talked about declaring 0.20 a 1.0 for a long time and this feels >> like backing into the decision, but technically, I believe it to be >> the right name for such a release. >> >> Thoughts? >> >> -- Owen >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetEli Collins 2011-01-12, 23:02
+1 on 0.20.x (where x is a J > 3)
Nigel - could we make all the patches in this branch that have not been committed up stream (that need to be) blockers for 22? This way 22 is not a regression against 0.20.x. Thanks, Eli On Wed, Jan 12, 2011 at 2:56 PM, Nigel Daley <[EMAIL PROTECTED]> wrote: > +1 for 0.20.x, where x >= 100. I agree that the 1.0 moniker would involve more discussion. > > Will this be a jumbo patch attached to a Jira and then committed to the branch? Just curious. > > Cheers, > Nige > > > On Jan 12, 2011, at 1:34 PM, Arun C Murthy wrote: > >> I'm willing to discuss any and all options, for a very short period. >> >> Technically you have a reasonable point, Doug has suggested this in the past too. If everyone agrees, fine; if not, I'm do not want hung up on a release number. I just *do not* want a controversy. >> >> As I mentioned, I'm looking to finish this up in a couple of weeks; so, I could do without a long discussion on the on the critical path. >> >> I'm happy to go with a reasonable compromise, if not, hadoop-0.20.100 is what I'm priming for. >> >> Heck, if Stack wants to call the append release (not sure how far ahead he is) as hadoop-0.20.100, I'm willing to call this hadoop-0.20.200. >> >> All I care about is having a distinct release number from 0.20.2 (our last stable release). Again, I just want to get a release into the hands of our users. Please, let's resolve this quickly. Please. >> >> Arun >> >> On Jan 12, 2011, at 1:10 PM, Owen O'Malley wrote: >> >>> >>> On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote: >>> >>>> I'm open to suggestions - how about something like 20.100 to show >>>> that it's a big jump? Anything else? >>> >>> >>> Although I'm not wild about any of the potential release names, this >>> patch set is neither a subset or superset of the 0.21 or 0.22 >>> branches. Given that, I think that a new major release number makes >>> the most sense. It is also relatively likely that additional minor >>> releases will be made off of this branch while 0.22 is stabilizing. >>> We've talked about declaring 0.20 a 1.0 for a long time and this feels >>> like backing into the decision, but technically, I believe it to be >>> the right name for such a release. >>> >>> Thoughts? >>> >>> -- Owen >> > >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetNigel Daley 2011-01-12, 23:14
> Nigel - could we make all the patches in this branch that have not
> been committed up stream (that need to be) blockers for 22? This way > 22 is not a regression against 0.20.x. I sure hope so. May be difficult to untangle if it's a jumbo patch -- answer is in the details of how it's contributed. Cheers, Nige On Jan 12, 2011, at 3:02 PM, Eli Collins wrote: > +1 on 0.20.x (where x is a J > 3) > > Nigel - could we make all the patches in this branch that have not > been committed up stream (that need to be) blockers for 22? This way > 22 is not a regression against 0.20.x. > > Thanks, > Eli > > On Wed, Jan 12, 2011 at 2:56 PM, Nigel Daley <[EMAIL PROTECTED]> wrote: >> +1 for 0.20.x, where x >= 100. I agree that the 1.0 moniker would involve more discussion. >> >> Will this be a jumbo patch attached to a Jira and then committed to the branch? Just curious. >> >> Cheers, >> Nige >> >> >> On Jan 12, 2011, at 1:34 PM, Arun C Murthy wrote: >> >>> I'm willing to discuss any and all options, for a very short period. >>> >>> Technically you have a reasonable point, Doug has suggested this in the past too. If everyone agrees, fine; if not, I'm do not want hung up on a release number. I just *do not* want a controversy. >>> >>> As I mentioned, I'm looking to finish this up in a couple of weeks; so, I could do without a long discussion on the on the critical path. >>> >>> I'm happy to go with a reasonable compromise, if not, hadoop-0.20.100 is what I'm priming for. >>> >>> Heck, if Stack wants to call the append release (not sure how far ahead he is) as hadoop-0.20.100, I'm willing to call this hadoop-0.20.200. >>> >>> All I care about is having a distinct release number from 0.20.2 (our last stable release). Again, I just want to get a release into the hands of our users. Please, let's resolve this quickly. Please. >>> >>> Arun >>> >>> On Jan 12, 2011, at 1:10 PM, Owen O'Malley wrote: >>> >>>> >>>> On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote: >>>> >>>>> I'm open to suggestions - how about something like 20.100 to show >>>>> that it's a big jump? Anything else? >>>> >>>> >>>> Although I'm not wild about any of the potential release names, this >>>> patch set is neither a subset or superset of the 0.21 or 0.22 >>>> branches. Given that, I think that a new major release number makes >>>> the most sense. It is also relatively likely that additional minor >>>> releases will be made off of this branch while 0.22 is stabilizing. >>>> We've talked about declaring 0.20 a 1.0 for a long time and this feels >>>> like backing into the decision, but technically, I believe it to be >>>> the right name for such a release. >>>> >>>> Thoughts? >>>> >>>> -- Owen >>> >> >>
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetIan Holsman 2011-01-12, 23:26
So what is the plan with 20.3 that Owen volunteered to RM?
Should we do that, or just integrate the security code with that and call it 20.x? --- Ian Holsman - 703 879-3128 I saw the angel in the marble and carved until I set him free -- Michelangelo On 12/01/2011, at 6:02 PM, Eli Collins <[EMAIL PROTECTED]> wrote: > +1 on 0.20.x (where x is a J > 3) > > Nigel - could we make all the patches in this branch that have not > been committed up stream (that need to be) blockers for 22? This way > 22 is not a regression against 0.20.x. > > Thanks, > Eli > > On Wed, Jan 12, 2011 at 2:56 PM, Nigel Daley <[EMAIL PROTECTED]> wrote: >> +1 for 0.20.x, where x >= 100. I agree that the 1.0 moniker would involve more discussion. >> >> Will this be a jumbo patch attached to a Jira and then committed to the branch? Just curious. >> >> Cheers, >> Nige >> >> >> On Jan 12, 2011, at 1:34 PM, Arun C Murthy wrote: >> >>> I'm willing to discuss any and all options, for a very short period. >>> >>> Technically you have a reasonable point, Doug has suggested this in the past too. If everyone agrees, fine; if not, I'm do not want hung up on a release number. I just *do not* want a controversy. >>> >>> As I mentioned, I'm looking to finish this up in a couple of weeks; so, I could do without a long discussion on the on the critical path. >>> >>> I'm happy to go with a reasonable compromise, if not, hadoop-0.20.100 is what I'm priming for. >>> >>> Heck, if Stack wants to call the append release (not sure how far ahead he is) as hadoop-0.20.100, I'm willing to call this hadoop-0.20.200. >>> >>> All I care about is having a distinct release number from 0.20.2 (our last stable release). Again, I just want to get a release into the hands of our users. Please, let's resolve this quickly. Please. >>> >>> Arun >>> >>> On Jan 12, 2011, at 1:10 PM, Owen O'Malley wrote: >>> >>>> >>>> On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote: >>>> >>>>> I'm open to suggestions - how about something like 20.100 to show >>>>> that it's a big jump? Anything else? >>>> >>>> >>>> Although I'm not wild about any of the potential release names, this >>>> patch set is neither a subset or superset of the 0.21 or 0.22 >>>> branches. Given that, I think that a new major release number makes >>>> the most sense. It is also relatively likely that additional minor >>>> releases will be made off of this branch while 0.22 is stabilizing. >>>> We've talked about declaring 0.20 a 1.0 for a long time and this feels >>>> like backing into the decision, but technically, I believe it to be >>>> the right name for such a release. >>>> >>>> Thoughts? >>>> >>>> -- Owen >>> >> >>
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-13, 07:07
On Jan 12, 2011, at 2:56 PM, Nigel Daley wrote: > +1 for 0.20.x, where x >= 100. I agree that the 1.0 moniker would > involve more discussion. Ok, seems like we are converging; we can continue talking. I've created the branch to get the ball rolling. > Will this be a jumbo patch attached to a Jira and then committed to > the branch? Just curious. I'm afraid that the svn log of the branch from github Y! branch is fairly useless since a single JIRA might have multiple commits in the Y! branch (bugfix on top of a bugfix). We have done that in several cases (but the patches committed to trunk have a single patch which is the result of forward porting a complete feature/bugfix). IAC the this branch and 0.22 have diverged so much that almost no non-trivial patch would apply without a significant amount of work. Thus, I think a jumbo patch should suffice. It will also ensure this can done quickly so that the community can then concentrate on 0.22 and beyond. However, I will (manually) ensure all relevant jiras are referenced in the CHANGES.txt and Release Notes for folks to see the contents of the release. This is the hardest part of the exercise. Also, this ensures that we can track these jiras for 0.22 as Eli suggested. Does that seem like a reasonable way forward? I'm happy to brainstorm. thanks, Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetNigel Daley 2011-01-13, 07:09
On Jan 12, 2011, at 11:07 PM, Arun C Murthy wrote: > > On Jan 12, 2011, at 2:56 PM, Nigel Daley wrote: > >> +1 for 0.20.x, where x >= 100. I agree that the 1.0 moniker would involve more discussion. > > Ok, seems like we are converging; we can continue talking. I've created the branch to get the ball rolling. > >> Will this be a jumbo patch attached to a Jira and then committed to the branch? Just curious. > > I'm afraid that the svn log of the branch from github Y! branch is fairly useless since a single JIRA might have multiple commits in the Y! branch (bugfix on top of a bugfix). We have done that in several cases (but the patches committed to trunk have a single patch which is the result of forward porting a complete feature/bugfix). IAC the this branch and 0.22 have diverged so much that almost no non-trivial patch would apply without a significant amount of work. > > Thus, I think a jumbo patch should suffice. It will also ensure this can done quickly so that the community can then concentrate on 0.22 and beyond. > > However, I will (manually) ensure all relevant jiras are referenced in the CHANGES.txt and Release Notes for folks to see the contents of the release. This is the hardest part of the exercise. Also, this ensures that we can track these jiras for 0.22 as Eli suggested. > > Does that seem like a reasonable way forward? I'm happy to brainstorm. +1. If it turns out to be insufficient to figure out how to apply similar changes to trunk/0.22 then we can address that as needed. Thanks Arun! Nige
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetTodd Lipcon 2011-01-13, 22:04
Hi Arun, all,
When we merged YDH and CDH for CDH3b3, we went through the effort of "linearizing" all of the YDH patches and squashing multiple commits into single ones corresponding to a single JIRA where possible. So, we have a 100% linear set of patches that applies on top of the 0.20.2 source tree and includes Yahoo 0.20.100.3 as well as almost all the patches from 0.20-append and a number of other backports. Since this could be applied as a linear set of patches instead of a big lump, would there be interest in using this as the 0.20.>100 Apache release? I can take the time to remove any patches that are cloudera specific or not yet applied upstream. Thanks -Todd On Wed, Jan 12, 2011 at 11:07 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > > On Jan 12, 2011, at 2:56 PM, Nigel Daley wrote: > > +1 for 0.20.x, where x >= 100. I agree that the 1.0 moniker would involve >> more discussion. >> > > Ok, seems like we are converging; we can continue talking. I've created the > branch to get the ball rolling. > > > Will this be a jumbo patch attached to a Jira and then committed to the >> branch? Just curious. >> > > I'm afraid that the svn log of the branch from github Y! branch is fairly > useless since a single JIRA might have multiple commits in the Y! branch > (bugfix on top of a bugfix). We have done that in several cases (but the > patches committed to trunk have a single patch which is the result of > forward porting a complete feature/bugfix). IAC the this branch and 0.22 > have diverged so much that almost no non-trivial patch would apply without a > significant amount of work. > > Thus, I think a jumbo patch should suffice. It will also ensure this can > done quickly so that the community can then concentrate on 0.22 and beyond. > > However, I will (manually) ensure all relevant jiras are referenced in the > CHANGES.txt and Release Notes for folks to see the contents of the release. > This is the hardest part of the exercise. Also, this ensures that we can > track these jiras for 0.22 as Eli suggested. > > Does that seem like a reasonable way forward? I'm happy to brainstorm. > > thanks, > Arun > > -- Todd Lipcon Software Engineer, Cloudera
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-13, 23:05
Todd,
On Jan 13, 2011, at 2:04 PM, Todd Lipcon wrote: > Hi Arun, all, > > When we merged YDH and CDH for CDH3b3, we went through the effort of > "linearizing" all of the YDH patches and squashing multiple commits > into > single ones corresponding to a single JIRA where possible. So, we > have a > 100% linear set of patches that applies on top of the 0.20.2 source > tree and > includes Yahoo 0.20.100.3 as well as almost all the patches from > 0.20-append > and a number of other backports. > > Since this could be applied as a linear set of patches instead of a > big > lump, would there be interest in using this as the 0.20.>100 Apache > release? > I can take the time to remove any patches that are cloudera specific > or not > yet applied upstream. > Interesting discussion, thanks. I'm sure it took you a fair amount of work to squash patches (which I tried too, btw). That, plus the fact that we would need to do a similar amount of work for the 10 or so releases we have done after 0.20.100.3 scares me. As we Nigel and I discussed here, the jumbo patch and an up-to-date CHANGES.txt provides almost all of the benefits we seek and allows all of us to get this done very quickly to focus on hadoop-0.22 and beyond. What do you think? OTOH, I could get this release done and start squashing patches for the sake of completeness as a background activity. Thoughts? thanks, Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetTodd Lipcon 2011-01-13, 23:34
On Thu, Jan 13, 2011 at 3:05 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> Since this could be applied as a linear set of patches instead of a big >> lump, would there be interest in using this as the 0.20.>100 Apache >> release? >> I can take the time to remove any patches that are cloudera specific or >> not >> yet applied upstream. >> >> > Interesting discussion, thanks. > > I'm sure it took you a fair amount of work to squash patches (which I tried > too, btw). Yep, I had a great summer ;-) > That, plus the fact that we would need to do a similar amount of work for > the 10 or so releases we have done after 0.20.100.3 scares me. > Sorry, I actually meant 0.20.104.3. Have there been many releases since then? That's the last version available on the Yahoo github, and that's the version we incorporated/linearized. If there is a large sequence of patches after this that you're planning on including, it would be good to see them in your git repo. > As we Nigel and I discussed here, the jumbo patch and an up-to-date > CHANGES.txt provides almost all of the benefits we seek and allows all of us > to get this done very quickly to focus on hadoop-0.22 and beyond. > > In my opinion here are the downsides to this plan: - a mondo "merge" patch is a big pain when trying to do debugging. It may be sufficient for a user to look at CHANGES.txt, but I find myself using blame/log/etc on individual files to understand code lineage on a daily basis. If all of the merge shows up as a big patch it will be very difficult (at least the way I work with code) to help users debug issues or understand which JIRA a certain regression may have come from. - CHANGES.txt traditionally doesn't reference which patch file from a JIRA was checked in. So we may know that a given JIRA has been included, but often there are several revisions of patches on the JIRA and it's difficult to be sure that we have the most up-to-date version. By looking at change history it's usually easy to pick this out, but if it's one giant patch apply, this isn't possible. - the proposal to use the YDH distro certainly solves the Security issue, but doesn't help out HBase at all. Given HBase has been asking for a long time to get a real release of the append branch, I think it would be better to have one 20-based release which has both of these features, rather than further fragmenting the community into 0.20.2, 0.20.2+security, 0.20.2+append. I think the first two points could be addressed if you push your git tree either to github or an apache-hosted git, and then include in SVN as a mondo patch. It's not ideal, but at least when trying to debug issues and understand the history of this branch there will be a publicly available change history to reference. To clarify my position a bit here - I definitely appreciate your volunteering to do the work, and wouldn't *block* the proposal as you've put it forth. I just think it will have limited utility for the community by being opaque (if contributed as a giant patch) and by not including the sync feature which is critical for a large segment of users. Given those downsides I'd rather see the effort diverted towards making a killer 0.22 release that we can all jump on. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-14, 00:58
On Jan 13, 2011, at 3:34 PM, Todd Lipcon wrote: > On Thu, Jan 13, 2011 at 3:05 PM, Arun C Murthy <[EMAIL PROTECTED]> > wrote: > >> Since this could be applied as a linear set of patches instead of a >> big >>> lump, would there be interest in using this as the 0.20.>100 Apache >>> release? >>> I can take the time to remove any patches that are cloudera >>> specific or >>> not >>> yet applied upstream. >>> >>> >> Interesting discussion, thanks. >> >> I'm sure it took you a fair amount of work to squash patches (which >> I tried >> too, btw). > > > Yep, I had a great summer ;-) > > >> That, plus the fact that we would need to do a similar amount of >> work for >> the 10 or so releases we have done after 0.20.100.3 scares me. >> > > Sorry, I actually meant 0.20.104.3. Have there been many releases > since > then? That's the last version available on the Yahoo github, and > that's the > version we incorporated/linearized. Yep. I had a great summer! ;-) >> >> As we Nigel and I discussed here, the jumbo patch and an up-to-date >> CHANGES.txt provides almost all of the benefits we seek and allows >> all of us >> to get this done very quickly to focus on hadoop-0.22 and beyond. >> >> > In my opinion here are the downsides to this plan: > I agree there are downsides, I think I did point them out at the outset! :) > - a mondo "merge" patch is a big pain when trying to do debugging. > It may be > sufficient for a user to look at CHANGES.txt, but I find myself using > blame/log/etc on individual files to understand code lineage on a > daily > basis. If all of the merge shows up as a big patch it will be very > difficult > (at least the way I work with code) to help users debug issues or > understand > which JIRA a certain regression may have come from. > Right, no question. Which is why I offered to do this as a background activity right after... this ensures that the source of truth is *always* a branch in Apache subversion. I feel that we could get a usable release out of door quickly for our users. Also, please remember that almost every patch we have committed is available on relevant jiras. I understand the devs have a problem and I feel we can bear with it for a little while. Again, I agree this isn't an ideal solution, I'm just trying to expedite the release for the users. > > To clarify my position a bit here - I definitely appreciate your > volunteering to do the work, and wouldn't *block* the proposal as > you've put > it forth. I just think it will have limited utility for the > community by > being opaque (if contributed as a giant patch) and by not including > the sync > feature which is critical for a large segment of users. Given those > downsides I'd rather see the effort diverted towards making a killer > 0.22 > release that we can all jump on. > Thanks for understanding. Again, I completely agree this isn't an ideal situation, but I do hope it has a bit more than *limited utility* for our end-users. Who knows, I maybe hopelessly deluded! *smile* Also, I'm trying to do exactly what you suggested - spend very little time on this so that everyone, including me, can focus on the future. thanks, Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetEli Collins 2011-01-14, 01:35
On Thursday, January 13, 2011, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> > On Jan 13, 2011, at 3:34 PM, Todd Lipcon wrote: > > > On Thu, Jan 13, 2011 at 3:05 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > > > Since this could be applied as a linear set of patches instead of a big > > lump, would there be interest in using this as the 0.20.>100 Apache > release? > I can take the time to remove any patches that are cloudera specific or > not > yet applied upstream. > > > > Interesting discussion, thanks. > > I'm sure it took you a fair amount of work to squash patches (which I tried > too, btw). > > > > Yep, I had a great summer ;-) > > > > That, plus the fact that we would need to do a similar amount of work for > the 10 or so releases we have done after 0.20.100.3 scares me. > > > > Sorry, I actually meant 0.20.104.3. Have there been many releases since > then? That's the last version available on the Yahoo github, and that's the > version we incorporated/linearized. > > > Yep. I had a great summer! ;-) > > > > As we Nigel and I discussed here, the jumbo patch and an up-to-date > CHANGES.txt provides almost all of the benefits we seek and allows all of us > to get this done very quickly to focus on hadoop-0.22 and beyond. > > > > In my opinion here are the downsides to this plan: > > > > I agree there are downsides, I think I did point them out at the outset! :) > > > - a mondo "merge" patch is a big pain when trying to do debugging. It may be > sufficient for a user to look at CHANGES.txt, but I find myself using > blame/log/etc on individual files to understand code lineage on a daily > basis. If all of the merge shows up as a big patch it will be very difficult > (at least the way I work with code) to help users debug issues or understand > which JIRA a certain regression may have come from. > > > > Right, no question. Which is why I offered to do this as a background activity right after... this ensures that the source of truth is *always* a branch in Apache subversion. > > I feel that we could get a usable release out of door quickly for our users. Also, please remember that almost every patch we have committed is available on relevant jiras. I understand the devs have a problem and I feel we can bear with it for a little while. Again, I agree this isn't an ideal solution, I'm just trying to expedite the release for the users. > > > > To clarify my position a bit here - I definitely appreciate your > volunteering to do the work, and wouldn't *block* the proposal as you've put > it forth. I just think it will have limited utility for the community by > being opaque (if contributed as a giant patch) and by not including the sync > feature which is critical for a large segment of users. Given those > downsides I'd rather see the effort diverted towards making a killer 0.22 > release that we can all jump on. > > > > Thanks for understanding. > > Again, I completely agree this isn't an ideal situation, but I do hope it has a bit more than *limited utility* for our end-users. Who knows, I maybe hopelessly deluded! *smile* > > Also, I'm trying to do exactly what you suggested - spend very little time on this so that everyone, including me, can focus on the future. > > thanks, > Arun > Given that Todd has already done the work to rebase the 0.20.104.3 patch set on 0.20.2, and in a way that doesn't require one big change, and his patch set includes branch20-append which the HBase guys want an Apache release of wouldn't it make sense to go this route? What do others think? Seems better to have one 0.20.100 release than multiple ones for security and append. Thanks, Eli
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-14, 02:12
On Jan 13, 2011, at 5:35 PM, Eli Collins wrote: > Given that Todd has already done the work to rebase the 0.20.104.3 > patch set on 0.20.2, and in a way that doesn't require one big change, > and his patch set includes branch20-append which the HBase guys want > an Apache release of wouldn't it make sense to go this route? What do > others think? Seems better to have one 0.20.100 release than multiple > ones for security and append. My concern around 0.20.104.3 is that it has serious security holes including a root exploit that we have since fixed. I'm sure you guys are aware of them, Todd has helped to fix some. The version I'm offering to push to the community has fixed all of them, *plus* the added benefit of several stability and performance fixes we have done since 20.104.3, almost 10 internal releases. This is a battle tested and hardened version which we have deployed on 40,000+ nodes. It is a significant upgrade on 0.20.104.3 which we never deployed. I'm pretty sure *some* users will find that valuable. ;) Also, I've offered to push individual patches as a background activity on a branch - that should suffice, no? Or, do you consider this a blocker? Again, my goal in this exercise is to get a stable, improved version of Hadoop into the hands of our users asap, and focus on 0.22 and beyond. thanks, Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetEli Collins 2011-01-14, 02:50
On Thu, Jan 13, 2011 at 6:12 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> > On Jan 13, 2011, at 5:35 PM, Eli Collins wrote: >> >> Given that Todd has already done the work to rebase the 0.20.104.3 >> patch set on 0.20.2, and in a way that doesn't require one big change, >> and his patch set includes branch20-append which the HBase guys want >> an Apache release of wouldn't it make sense to go this route? What do >> others think? Seems better to have one 0.20.100 release than multiple >> ones for security and append. > > > My concern around 0.20.104.3 is that it has serious security holes including > a root exploit that we have since fixed. I'm sure you guys are aware of > them, Todd has helped to fix some. > The cdh3 patch set Todd is talking about is not vanilla 104.3, it's 104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the performance and stability fixes I think you're referring to, at least the ones that have been posted to Apache jira). Can you post a pointer to the version you're referring to, eg on github? If there isn't a big delta between it and the cdh3 patch set (which should have the 20-based patches from jira) perhaps you and Todd could easily merge in the delta to create 0.20.x? > The version I'm offering to push to the community has fixed all of them, > *plus* the added benefit of several stability and performance fixes we have > done since 20.104.3, almost 10 internal releases. This is a battle tested > and hardened version which we have deployed on 40,000+ nodes. It is a > significant upgrade on 0.20.104.3 which we never deployed. I'm pretty sure > *some* users will find that valuable. ;) Definitely, but better to hit two birds with one stone right? Instead of a security + enhancements release and an append release we could have a single security + append + enhancements release and users don't have to choose. > Also, I've offered to push individual patches as a background activity on a > branch - that should suffice, no? Or, do you consider this a blocker? Definitely not a blocker. > Again, my goal in this exercise is to get a stable, improved version of > Hadoop into the hands of our users asap, and focus on 0.22 and beyond. Agree, that's everyone's goal. My point is that a release that's already been re-based on 20.2, doesn't require a separate HBase release, and doesn't require you spend time on a background task to break up the big change into smaller ones seems like a faster way forward. Thanks, Eli
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-14, 04:23
On Jan 13, 2011, at 6:50 PM, Eli Collins wrote: > The cdh3 patch set Todd is talking about is not vanilla 104.3, it's > 104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the > performance and stability fixes I think you're referring to, at least > the ones that have been posted to Apache jira). > > Can you post a pointer to the version you're referring to, eg on > github? If there isn't a big delta between it and the cdh3 patch set > (which should have the 20-based patches from jira) perhaps you and > Todd could easily merge in the delta to create 0.20.x? > I can guarantee it will need work to merge the enhancements since 20.104.3, it's over 6 months of development. The enhancements includes work on stability such as iterative ls, limits on JT to prevent single jobs/users from taking it down etc. and lots of bug-fixes to security. So, unfortunately the delta is pretty large. I'm working on a CHANGES.txt which should reflect all the changes i.e. bug-fixes and enhancements. >> The version I'm offering to push to the community has fixed all of >> them, >> *plus* the added benefit of several stability and performance fixes >> we have >> done since 20.104.3, almost 10 internal releases. This is a battle >> tested >> and hardened version which we have deployed on 40,000+ nodes. It is a >> significant upgrade on 0.20.104.3 which we never deployed. I'm >> pretty sure >> *some* users will find that valuable. ;) > > Definitely, but better to hit two birds with one stone right? Instead > of a security + enhancements release and an append release we could > have a single security + append + enhancements release and users don't > have to choose. > We are discussing two options: 20 + security + enhancements 20 + security + append I think the value we provide via 20+security+enhancements release is that it's stable, tested and deployed at scale. Doing any more work merging 6 months of work at Yahoo (again, I guarantee it's a lot of work) will need a lots of cycles to validate, test and stabilize. I feel the alternative is a distraction for me, I'd rather work on 0.22. I can get 20+security+enhancements done very, very, quickly precisely because I don't have to spend cycles testing it. Does that make sense? Thanks for being patient and bearing with me... Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetTsz Wo \ 2011-01-14, 04:58
Below are copied from http://httpd.apache.org/dev/release.html. Not sure if it
helps. What power does the RM yield? Regarding what makes it into a release, the RM is the unquestioned authority. No one can contest what makes it into the release. The community will judge the release's quality after it has been issued, but the community can not force the RM to include a feature that they feel uncomfortable adding. Remember that this document is only a guideline to the community and future RMs - each RM may run a release in a different way. If you don't like what an RM is doing, start preparing for your own competing release. Nicholas
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetEric Baldeschwieler 2011-01-14, 05:11
Hi Eli,
Thanks for the suggestion. +1 to nigel and arun's proposal. I completely support the idea of creating a version of 20 with append for HBASE. However, the append issue is very complicated and there does not exist any version of append that is certified against a workload as diverse as what this branch has been tested against. I think you are trying to cross too many streams here. If you have resources to help integrate any version of Hadoop 0.20 with append, package and test it, I fully support you doing so. But that effort is not aligned with the goal of this branch, which is to share a substantial amount of fully integrated and tested work. Members of the community have expressed interest in seeing this tested work get checked into Apache and I would like to share it. Mashing it up with other patches would invalidate months of testing, defeating the purpose of the exercise. If you are interested in integrating Append with this branch, why not create a 20.200 branch and do so? Unless you are vetoing the sharing of work as is on a branch (the purpose of the branch), I suggest we move on. Thanks, E14 On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote: > > On Jan 13, 2011, at 6:50 PM, Eli Collins wrote: > >> The cdh3 patch set Todd is talking about is not vanilla 104.3, it's >> 104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the >> performance and stability fixes I think you're referring to, at least >> the ones that have been posted to Apache jira). >> >> Can you post a pointer to the version you're referring to, eg on >> github? If there isn't a big delta between it and the cdh3 patch set >> (which should have the 20-based patches from jira) perhaps you and >> Todd could easily merge in the delta to create 0.20.x? >> > > I can guarantee it will need work to merge the enhancements since > 20.104.3, it's over 6 months of development. The enhancements includes > work on stability such as iterative ls, limits on JT to prevent single > jobs/users from taking it down etc. and lots of bug-fixes to security. > So, unfortunately the delta is pretty large. > > I'm working on a CHANGES.txt which should reflect all the changes i.e. > bug-fixes and enhancements. > >>> The version I'm offering to push to the community has fixed all of >>> them, >>> *plus* the added benefit of several stability and performance fixes >>> we have >>> done since 20.104.3, almost 10 internal releases. This is a battle >>> tested >>> and hardened version which we have deployed on 40,000+ nodes. It is a >>> significant upgrade on 0.20.104.3 which we never deployed. I'm >>> pretty sure >>> *some* users will find that valuable. ;) >> >> Definitely, but better to hit two birds with one stone right? Instead >> of a security + enhancements release and an append release we could >> have a single security + append + enhancements release and users don't >> have to choose. >> > > > We are discussing two options: > 20 + security + enhancements > 20 + security + append > > I think the value we provide via 20+security+enhancements release is > that it's stable, tested and deployed at scale. Doing any more work > merging 6 months of work at Yahoo (again, I guarantee it's a lot of > work) will need a lots of cycles to validate, test and stabilize. > > I feel the alternative is a distraction for me, I'd rather work on 0.22. > > I can get 20+security+enhancements done very, very, quickly precisely > because I don't have to spend cycles testing it. > > Does that make sense? Thanks for being patient and bearing with me... > > Arun >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetNigel Daley 2011-01-14, 06:07
I say just do it. Eli said it wasn't a blocker. Sure it ain't perfect, but it's good enough.
Let's move on to 0.22 and beyond. Nige On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote: > > On Jan 13, 2011, at 6:50 PM, Eli Collins wrote: > >> The cdh3 patch set Todd is talking about is not vanilla 104.3, it's >> 104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the >> performance and stability fixes I think you're referring to, at least >> the ones that have been posted to Apache jira). >> >> Can you post a pointer to the version you're referring to, eg on >> github? If there isn't a big delta between it and the cdh3 patch set >> (which should have the 20-based patches from jira) perhaps you and >> Todd could easily merge in the delta to create 0.20.x? >> > > I can guarantee it will need work to merge the enhancements since 20.104.3, it's over 6 months of development. The enhancements includes work on stability such as iterative ls, limits on JT to prevent single jobs/users from taking it down etc. and lots of bug-fixes to security. So, unfortunately the delta is pretty large. > > I'm working on a CHANGES.txt which should reflect all the changes i.e. bug-fixes and enhancements. > >>> The version I'm offering to push to the community has fixed all of them, >>> *plus* the added benefit of several stability and performance fixes we have >>> done since 20.104.3, almost 10 internal releases. This is a battle tested >>> and hardened version which we have deployed on 40,000+ nodes. It is a >>> significant upgrade on 0.20.104.3 which we never deployed. I'm pretty sure >>> *some* users will find that valuable. ;) >> >> Definitely, but better to hit two birds with one stone right? Instead >> of a security + enhancements release and an append release we could >> have a single security + append + enhancements release and users don't >> have to choose. >> > > > We are discussing two options: > 20 + security + enhancements > 20 + security + append > > I think the value we provide via 20+security+enhancements release is that it's stable, tested and deployed at scale. Doing any more work merging 6 months of work at Yahoo (again, I guarantee it's a lot of work) will need a lots of cycles to validate, test and stabilize. > > I feel the alternative is a distraction for me, I'd rather work on 0.22. > > I can get 20+security+enhancements done very, very, quickly precisely because I don't have to spend cycles testing it. > > Does that make sense? Thanks for being patient and bearing with me... > > Arun >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-14, 06:21
*nod* Ok.
Arun On Jan 13, 2011, at 10:08 PM, "Nigel Daley" <[EMAIL PROTECTED]> wrote: > I say just do it. Eli said it wasn't a blocker. Sure it ain't perfect, but it's good enough. > > Let's move on to 0.22 and beyond. > > Nige > > On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote: > >> >> On Jan 13, 2011, at 6:50 PM, Eli Collins wrote: >> >>> The cdh3 patch set Todd is talking about is not vanilla 104.3, it's >>> 104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the >>> performance and stability fixes I think you're referring to, at least >>> the ones that have been posted to Apache jira). >>> >>> Can you post a pointer to the version you're referring to, eg on >>> github? If there isn't a big delta between it and the cdh3 patch set >>> (which should have the 20-based patches from jira) perhaps you and >>> Todd could easily merge in the delta to create 0.20.x? >>> >> >> I can guarantee it will need work to merge the enhancements since 20.104.3, it's over 6 months of development. The enhancements includes work on stability such as iterative ls, limits on JT to prevent single jobs/users from taking it down etc. and lots of bug-fixes to security. So, unfortunately the delta is pretty large. >> >> I'm working on a CHANGES.txt which should reflect all the changes i.e. bug-fixes and enhancements. >> >>>> The version I'm offering to push to the community has fixed all of them, >>>> *plus* the added benefit of several stability and performance fixes we have >>>> done since 20.104.3, almost 10 internal releases. This is a battle tested >>>> and hardened version which we have deployed on 40,000+ nodes. It is a >>>> significant upgrade on 0.20.104.3 which we never deployed. I'm pretty sure >>>> *some* users will find that valuable. ;) >>> >>> Definitely, but better to hit two birds with one stone right? Instead >>> of a security + enhancements release and an append release we could >>> have a single security + append + enhancements release and users don't >>> have to choose. >>> >> >> >> We are discussing two options: >> 20 + security + enhancements >> 20 + security + append >> >> I think the value we provide via 20+security+enhancements release is that it's stable, tested and deployed at scale. Doing any more work merging 6 months of work at Yahoo (again, I guarantee it's a lot of work) will need a lots of cycles to validate, test and stabilize. >> >> I feel the alternative is a distraction for me, I'd rather work on 0.22. >> >> I can get 20+security+enhancements done very, very, quickly precisely because I don't have to spend cycles testing it. >> >> Does that make sense? Thanks for being patient and bearing with me... >> >> Arun >> >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetEli Collins 2011-01-14, 06:29
Sorry for rattling you guys, definitely wasn't discussing a veto. I'm
absolutely not opposed, just thought the alternative Todd raised was worth a couple emails since users have requested both security and append, and such a branch that includes both of those plus enhancements and substantial testing exists. Arun - I appreciate all the info, looking forward to the release. Thanks, Eli On Thu, Jan 13, 2011 at 10:21 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > *nod* Ok. > > Arun > > On Jan 13, 2011, at 10:08 PM, "Nigel Daley" <[EMAIL PROTECTED]> wrote: > >> I say just do it. Eli said it wasn't a blocker. Sure it ain't perfect, but it's good enough. >> >> Let's move on to 0.22 and beyond. >> >> Nige >> >> On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote: >> >>> >>> On Jan 13, 2011, at 6:50 PM, Eli Collins wrote: >>> >>>> The cdh3 patch set Todd is talking about is not vanilla 104.3, it's >>>> 104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the >>>> performance and stability fixes I think you're referring to, at least >>>> the ones that have been posted to Apache jira). >>>> >>>> Can you post a pointer to the version you're referring to, eg on >>>> github? If there isn't a big delta between it and the cdh3 patch set >>>> (which should have the 20-based patches from jira) perhaps you and >>>> Todd could easily merge in the delta to create 0.20.x? >>>> >>> >>> I can guarantee it will need work to merge the enhancements since 20.104.3, it's over 6 months of development. The enhancements includes work on stability such as iterative ls, limits on JT to prevent single jobs/users from taking it down etc. and lots of bug-fixes to security. So, unfortunately the delta is pretty large. >>> >>> I'm working on a CHANGES.txt which should reflect all the changes i.e. bug-fixes and enhancements. >>> >>>>> The version I'm offering to push to the community has fixed all of them, >>>>> *plus* the added benefit of several stability and performance fixes we have >>>>> done since 20.104.3, almost 10 internal releases. This is a battle tested >>>>> and hardened version which we have deployed on 40,000+ nodes. It is a >>>>> significant upgrade on 0.20.104.3 which we never deployed. I'm pretty sure >>>>> *some* users will find that valuable. ;) >>>> >>>> Definitely, but better to hit two birds with one stone right? Instead >>>> of a security + enhancements release and an append release we could >>>> have a single security + append + enhancements release and users don't >>>> have to choose. >>>> >>> >>> >>> We are discussing two options: >>> 20 + security + enhancements >>> 20 + security + append >>> >>> I think the value we provide via 20+security+enhancements release is that it's stable, tested and deployed at scale. Doing any more work merging 6 months of work at Yahoo (again, I guarantee it's a lot of work) will need a lots of cycles to validate, test and stabilize. >>> >>> I feel the alternative is a distraction for me, I'd rather work on 0.22. >>> >>> I can get 20+security+enhancements done very, very, quickly precisely because I don't have to spend cycles testing it. >>> >>> Does that make sense? Thanks for being patient and bearing with me... >>> >>> Arun >>> >> >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetTodd Lipcon 2011-01-14, 06:36
On Thu, Jan 13, 2011 at 10:29 PM, Eli Collins <[EMAIL PROTECTED]> wrote:
> Sorry for rattling you guys, definitely wasn't discussing a veto. I'm > absolutely not opposed, just thought the alternative Todd raised was > worth a couple emails since users have requested both security and > append, and such a branch that includes both of those plus > enhancements and substantial testing exists. > > Arun - I appreciate all the info, looking forward to the release. > > Same here. Back to the patch queue for me! 0.22 here we come. -Todd > On Thu, Jan 13, 2011 at 10:21 PM, Arun C Murthy <[EMAIL PROTECTED]> > wrote: > > *nod* Ok. > > > > Arun > > > > On Jan 13, 2011, at 10:08 PM, "Nigel Daley" <[EMAIL PROTECTED]> wrote: > > > >> I say just do it. Eli said it wasn't a blocker. Sure it ain't perfect, > but it's good enough. > >> > >> Let's move on to 0.22 and beyond. > >> > >> Nige > >> > >> On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote: > >> > >>> > >>> On Jan 13, 2011, at 6:50 PM, Eli Collins wrote: > >>> > >>>> The cdh3 patch set Todd is talking about is not vanilla 104.3, it's > >>>> 104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the > >>>> performance and stability fixes I think you're referring to, at least > >>>> the ones that have been posted to Apache jira). > >>>> > >>>> Can you post a pointer to the version you're referring to, eg on > >>>> github? If there isn't a big delta between it and the cdh3 patch set > >>>> (which should have the 20-based patches from jira) perhaps you and > >>>> Todd could easily merge in the delta to create 0.20.x? > >>>> > >>> > >>> I can guarantee it will need work to merge the enhancements since > 20.104.3, it's over 6 months of development. The enhancements includes work > on stability such as iterative ls, limits on JT to prevent single jobs/users > from taking it down etc. and lots of bug-fixes to security. So, > unfortunately the delta is pretty large. > >>> > >>> I'm working on a CHANGES.txt which should reflect all the changes i.e. > bug-fixes and enhancements. > >>> > >>>>> The version I'm offering to push to the community has fixed all of > them, > >>>>> *plus* the added benefit of several stability and performance fixes > we have > >>>>> done since 20.104.3, almost 10 internal releases. This is a battle > tested > >>>>> and hardened version which we have deployed on 40,000+ nodes. It is a > >>>>> significant upgrade on 0.20.104.3 which we never deployed. I'm pretty > sure > >>>>> *some* users will find that valuable. ;) > >>>> > >>>> Definitely, but better to hit two birds with one stone right? Instead > >>>> of a security + enhancements release and an append release we could > >>>> have a single security + append + enhancements release and users don't > >>>> have to choose. > >>>> > >>> > >>> > >>> We are discussing two options: > >>> 20 + security + enhancements > >>> 20 + security + append > >>> > >>> I think the value we provide via 20+security+enhancements release is > that it's stable, tested and deployed at scale. Doing any more work merging > 6 months of work at Yahoo (again, I guarantee it's a lot of work) will need > a lots of cycles to validate, test and stabilize. > >>> > >>> I feel the alternative is a distraction for me, I'd rather work on > 0.22. > >>> > >>> I can get 20+security+enhancements done very, very, quickly precisely > because I don't have to spend cycles testing it. > >>> > >>> Does that make sense? Thanks for being patient and bearing with me... > >>> > >>> Arun > >>> > >> > > > -- Todd Lipcon Software Engineer, Cloudera
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-14, 06:37
No worries. Thanks to both Eli & Todd for the discussion.
I look forward to getting this done and moving ahead to 0.22 and beyond. thanks, Arun On Jan 13, 2011, at 10:29 PM, "Eli Collins" <[EMAIL PROTECTED]> wrote: > Sorry for rattling you guys, definitely wasn't discussing a veto. I'm > absolutely not opposed, just thought the alternative Todd raised was > worth a couple emails since users have requested both security and > append, and such a branch that includes both of those plus > enhancements and substantial testing exists. > > Arun - I appreciate all the info, looking forward to the release. > > Thanks, > Eli > > On Thu, Jan 13, 2011 at 10:21 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >> *nod* Ok. >> >> Arun >> >> On Jan 13, 2011, at 10:08 PM, "Nigel Daley" <[EMAIL PROTECTED]> wrote: >> >>> I say just do it. Eli said it wasn't a blocker. Sure it ain't perfect, but it's good enough. >>> >>> Let's move on to 0.22 and beyond. >>> >>> Nige >>> >>> On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote: >>> >>>> >>>> On Jan 13, 2011, at 6:50 PM, Eli Collins wrote: >>>> >>>>> The cdh3 patch set Todd is talking about is not vanilla 104.3, it's >>>>> 104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the >>>>> performance and stability fixes I think you're referring to, at least >>>>> the ones that have been posted to Apache jira). >>>>> >>>>> Can you post a pointer to the version you're referring to, eg on >>>>> github? If there isn't a big delta between it and the cdh3 patch set >>>>> (which should have the 20-based patches from jira) perhaps you and >>>>> Todd could easily merge in the delta to create 0.20.x? >>>>> >>>> >>>> I can guarantee it will need work to merge the enhancements since 20.104.3, it's over 6 months of development. The enhancements includes work on stability such as iterative ls, limits on JT to prevent single jobs/users from taking it down etc. and lots of bug-fixes to security. So, unfortunately the delta is pretty large. >>>> >>>> I'm working on a CHANGES.txt which should reflect all the changes i.e. bug-fixes and enhancements. >>>> >>>>>> The version I'm offering to push to the community has fixed all of them, >>>>>> *plus* the added benefit of several stability and performance fixes we have >>>>>> done since 20.104.3, almost 10 internal releases. This is a battle tested >>>>>> and hardened version which we have deployed on 40,000+ nodes. It is a >>>>>> significant upgrade on 0.20.104.3 which we never deployed. I'm pretty sure >>>>>> *some* users will find that valuable. ;) >>>>> >>>>> Definitely, but better to hit two birds with one stone right? Instead >>>>> of a security + enhancements release and an append release we could >>>>> have a single security + append + enhancements release and users don't >>>>> have to choose. >>>>> >>>> >>>> >>>> We are discussing two options: >>>> 20 + security + enhancements >>>> 20 + security + append >>>> >>>> I think the value we provide via 20+security+enhancements release is that it's stable, tested and deployed at scale. Doing any more work merging 6 months of work at Yahoo (again, I guarantee it's a lot of work) will need a lots of cycles to validate, test and stabilize. >>>> >>>> I feel the alternative is a distraction for me, I'd rather work on 0.22. >>>> >>>> I can get 20+security+enhancements done very, very, quickly precisely because I don't have to spend cycles testing it. >>>> >>>> Does that make sense? Thanks for being patient and bearing with me... >>>> >>>> Arun >>>> >>> >>
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetStack 2011-01-14, 06:59
(Man, it was looking good there for a second when 0.20.100 was about
security+append!) Good luck w/ the release Arun. We might be following your 0.20.100 with a 0.20.200 append. St.Ack
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetEric Baldeschwieler 2011-01-14, 07:16
I'd love to see that!
On Jan 13, 2011, at 10:59 PM, Stack wrote: > (Man, it was looking good there for a second when 0.20.100 was about > security+append!) > > Good luck w/ the release Arun. > > We might be following your 0.20.100 with a 0.20.200 append. > > St.Ack
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-14, 07:21
On Jan 13, 2011, at 10:59 PM, Stack wrote: > (Man, it was looking good there for a second when 0.20.100 was about > security+append!) > > Good luck w/ the release Arun. > Thanks! > We might be following your 0.20.100 with a 0.20.200 append. > Super! Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetIan Holsman 2011-01-14, 14:14
(with my Apache hat on)
I'm -0.5 on doing this as one big mega-patch and not including append (as opposed to a series of smaller patches). for the following reasons: 1. It encourages bad behavior. We want discussion (and development) to happen on the lists, not in some office. By allowing these large code-dumps it condones this behavior, and we will likely see it again and again. Like it or not, this is not the apache model of open source governance. 2. There is a risk that some code that is not in a JIRA or separate patch creeps in unwittingly. This isn't a major deal per se, but we don't really have the proper paper trail, or the documentation on what bug it fixed etc etc. 3. Other groups (Facebook for example) are running with their own set of patches. They currently have the luxury of examining each individual patch to decide if they want to integrate it (and test it) in their environment. We are forcing them to do the work of finding the bits they want in this huge patch. 4. By not including the append patch, we are making this release unusable for a large portion of our community who run hbase. 5. It makes it very hard to test. While It makes me comfortable that it has gone through Yahoo!'s QA and is running in their environments, it doesn't mean that it will work in other organizations who have different workload mixes and software running on them. With one huge patch it makes it all or nothing.. either they take the code-drop and perform a large QA-integration effort, or they forgo the whole patch together. **BUT** we have both the Yahoo! & Cloudera guys happy to do it, and to spend their time doing it.. so I think having the code-drop will put us in a better place then where we are. BTW, I'd like to point out a discrepancy here: On another thread discussing hadoop-0.20-append as a separate branch, most people agreed that new features shouldn't be added to 0.20, now we have a major feature and we are all gung ho for it.. --Ian On Jan 14, 2011, at 2:21 AM, Arun C Murthy wrote: > > On Jan 13, 2011, at 10:59 PM, Stack wrote: > >> (Man, it was looking good there for a second when 0.20.100 was about >> security+append!) >> >> Good luck w/ the release Arun. >> > > Thanks! > >> We might be following your 0.20.100 with a 0.20.200 append. >> > > Super! > > Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetKonstantin Boudnik 2011-01-14, 17:20
I tend to second most of Ian's points here.
On Fri, Jan 14, 2011 at 06:14, Ian Holsman <[EMAIL PROTECTED]> wrote: > (with my Apache hat on) > I'm -0.5 on doing this as one big mega-patch and not including append (as opposed to a series of smaller patches). #1: we are creating a precedent of a "brain-dump" here. Although, it isn't the first one in the history of OSS. Infamous Apple "patch" to OpenBSD is another one ;) #2: How to spell 'back door' any one? #5: "almost 10 internal releases" Arun has mentioned above might be, perhaps, considered as a great quality control effort. Also, not to mention virtual impossibility to create a test plan to validate a giant features patch. > BTW, I'd like to point out a discrepancy here: > > On another thread discussing hadoop-0.20-append as a separate branch, most people agreed that new features shouldn't be added to 0.20, now we have a major feature and we are all gung ho for it.. And this ^^^ But, hey I guess it's totally worth it! Cos > --Ian > > On Jan 14, 2011, at 2:21 AM, Arun C Murthy wrote: > >> >> On Jan 13, 2011, at 10:59 PM, Stack wrote: >> >>> (Man, it was looking good there for a second when 0.20.100 was about >>> security+append!) >>> >>> Good luck w/ the release Arun. >>> >> >> Thanks! >> >>> We might be following your 0.20.100 with a 0.20.200 append. >>> >> >> Super! >> >> Arun > >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetNigel Daley 2011-01-14, 17:32
Yup, I'll say it again. The process ain't perfect but it's good enough IMO. Thank you Yahoo! for your contribution.
Clearly these patch will need review before commit when going into trunk. Let's move on to 0.22. Nige On Jan 14, 2011, at 9:20 AM, Konstantin Boudnik wrote: > I tend to second most of Ian's points here. > > On Fri, Jan 14, 2011 at 06:14, Ian Holsman <[EMAIL PROTECTED]> wrote: >> (with my Apache hat on) >> I'm -0.5 on doing this as one big mega-patch and not including append (as opposed to a series of smaller patches). > > #1: we are creating a precedent of a "brain-dump" here. Although, it > isn't the first one in the history of OSS. Infamous Apple "patch" to > OpenBSD is another one ;) > > #2: How to spell 'back door' any one? > > #5: "almost 10 internal releases" Arun has mentioned above might be, > perhaps, considered as a great quality control effort. Also, not to > mention virtual impossibility to create a test plan to validate a > giant features patch. > >> BTW, I'd like to point out a discrepancy here: >> >> On another thread discussing hadoop-0.20-append as a separate branch, most people agreed that new features shouldn't be added to 0.20, now we have a major feature and we are all gung ho for it.. > > And this ^^^ > > But, hey I guess it's totally worth it! > Cos > >> --Ian >> >> On Jan 14, 2011, at 2:21 AM, Arun C Murthy wrote: >> >>> >>> On Jan 13, 2011, at 10:59 PM, Stack wrote: >>> >>>> (Man, it was looking good there for a second when 0.20.100 was about >>>> security+append!) >>>> >>>> Good luck w/ the release Arun. >>>> >>> >>> Thanks! >>> >>>> We might be following your 0.20.100 with a 0.20.200 append. >>>> >>> >>> Super! >>> >>> Arun >> >>
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetIan Holsman 2011-01-14, 18:00
On Jan 14, 2011, at 12:32 PM, Nigel Daley wrote: > Yup, I'll say it again. The process ain't perfect but it's good enough IMO. Thank you Yahoo! for your contribution. agree 100%.
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetJakob Homan 2011-01-14, 18:03
> On another thread discussing hadoop-0.20-append as a separate branch, most people agreed that new features shouldn't be added to 0.20, now we have a major feature and we are all gung ho for it..
Not all are. I'm against it for the all the same reasons I was against 20 append. This is also being used as a wedge to get the append work in as .200. My position is that every iota effort of releasing another 20 branch is an iota not spent on getting us a kick-ass 22. 20 was great, and we had a lot of wonderful times together, but it's time to move on and see other releases. But, this is a volunteer effort, and if others want to put the effort in, they're free to do so. -jg On Fri, Jan 14, 2011 at 9:32 AM, Nigel Daley <[EMAIL PROTECTED]> wrote: > Yup, I'll say it again. The process ain't perfect but it's good enough IMO. Thank you Yahoo! for your contribution. > > Clearly these patch will need review before commit when going into trunk. > > Let's move on to 0.22. > > Nige > > On Jan 14, 2011, at 9:20 AM, Konstantin Boudnik wrote: > >> I tend to second most of Ian's points here. >> >> On Fri, Jan 14, 2011 at 06:14, Ian Holsman <[EMAIL PROTECTED]> wrote: >>> (with my Apache hat on) >>> I'm -0.5 on doing this as one big mega-patch and not including append (as opposed to a series of smaller patches). >> >> #1: we are creating a precedent of a "brain-dump" here. Although, it >> isn't the first one in the history of OSS. Infamous Apple "patch" to >> OpenBSD is another one ;) >> >> #2: How to spell 'back door' any one? >> >> #5: "almost 10 internal releases" Arun has mentioned above might be, >> perhaps, considered as a great quality control effort. Also, not to >> mention virtual impossibility to create a test plan to validate a >> giant features patch. >> >>> BTW, I'd like to point out a discrepancy here: >>> >>> On another thread discussing hadoop-0.20-append as a separate branch, most people agreed that new features shouldn't be added to 0.20, now we have a major feature and we are all gung ho for it.. >> >> And this ^^^ >> >> But, hey I guess it's totally worth it! >> Cos >> >>> --Ian >>> >>> On Jan 14, 2011, at 2:21 AM, Arun C Murthy wrote: >>> >>>> >>>> On Jan 13, 2011, at 10:59 PM, Stack wrote: >>>> >>>>> (Man, it was looking good there for a second when 0.20.100 was about >>>>> security+append!) >>>>> >>>>> Good luck w/ the release Arun. >>>>> >>>> >>>> Thanks! >>>> >>>>> We might be following your 0.20.100 with a 0.20.200 append. >>>>> >>>> >>>> Super! >>>> >>>> Arun >>> >>> > >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetEric Baldeschwieler 2011-01-14, 18:25
Hi Ian,
Thanks for holding off on that last .5. I've been working in a big email giving move context on this. Let me preview some issues. Our goal with this branch is two fold: 1) get the code out in a branch quickly so we an collaborate on it with the community. 2) not change the character of the code. See testing below. We're happy to compromise any other dimension, as long as we can do 1&2 above. 1) I agree this is not a good precedent. We don't support mega-patches in general. We are doing this as part of discontinuing the "yahoo distribution of Hadoop". We don't plan to continue doing 30 person year projects outside apache and then merging them in!! 2) append is hard. It is so hard we rewrote the entire write pipeline (5 person-years work) in trunk after giving up on the codeline you are suggesting we merge in. That work is what distinguishes all post 20 releases from 20 releases in my mind. I dont trust the 20 append code line. We've been hurt badly by it. We did the rewrite only after losing a bunch of production data a bunch of times with the previous code line. I think the various 20 append patch lines may be fine for specialized hbase clusters, but they doesn't have the rigor behind them to bet your business in them. 3) I think having a very stable recent codeline available for teams coming into Hadoop who want to run big business apps and contribute code back is very helpful. I've been talking to folks in other orgs and they've expressed a huge amount of interest in this work, but begged us to put it into apache, so their oversight bodies will let them use it. 4) we're happy to incorporate ideas into how to best merge the work into trunk. Let's find the most cost effective way to preserve the most devel data possible. 5) testing. Ian, I think you do us a disservice when you talk about us just testing in our environments. If you look at the history of the project, we've been the force behind every stable release of apache Hadoop. And all the non-apache Hadoop release had been tracking this patch set. We fully support the community developing independent testing capabilities. We plan to contribute to that effort. But we are the organization with far and away the best record for testing Hadoop. We are proud of thus release, we want to share it. Help us sort out how. Thanks! --- E14 - via iPhone On Jan 14, 2011, at 6:15 AM, "Ian Holsman" <[EMAIL PROTECTED]> wrote: > (with my Apache hat on) > I'm -0.5 on doing this as one big mega-patch and not including append (as opposed to a series of smaller patches). > > for the following reasons: > > 1. It encourages bad behavior. We want discussion (and development) to happen on the lists, not in some office. By allowing these large code-dumps it condones this behavior, and we will likely see it again and again. Like it or not, this is not the apache model of open source governance. > > 2. There is a risk that some code that is not in a JIRA or separate patch creeps in unwittingly. This isn't a major deal per se, but we don't really have the proper paper trail, or the documentation on what bug it fixed etc etc. > > 3. Other groups (Facebook for example) are running with their own set of patches. They currently have the luxury of examining each individual patch to decide if they want to integrate it (and test it) in their environment. We are forcing them to do the work of finding the bits they want in this huge patch. > > 4. By not including the append patch, we are making this release unusable for a large portion of our community who run hbase. > > 5. It makes it very hard to test. While It makes me comfortable that it has gone through Yahoo!'s QA and is running in their environments, it doesn't mean that it will work in other organizations who have different workload mixes and software running on them. With one huge patch it makes it all or nothing.. either they take the code-drop and perform a large QA-integration effort, or they forgo the whole patch together.
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetEric Baldeschwieler 2011-01-14, 18:30
Yup. Letting people who want to contribute, do so a good meme!
A stable next release would be great. But orgs do sustaining on stable code releases for a lot of very good reasons. A next Hadoop 21+ of this code quality is almost a year away in my opinion. --- E14 - via iPhone On Jan 14, 2011, at 10:05 AM, "Jakob Homan" <[EMAIL PROTECTED]> wrote: >> On another thread discussing hadoop-0.20-append as a separate branch, most people agreed that new features shouldn't be added to 0.20, now we have a major feature and we are all gung ho for it.. > > Not all are. I'm against it for the all the same reasons I was > against 20 append. This is also being used as a wedge to get the > append work in as .200. My position is that every iota effort of > releasing another 20 branch is an iota not spent on getting us a > kick-ass 22. 20 was great, and we had a lot of wonderful times > together, but it's time to move on and see other releases. > > But, this is a volunteer effort, and if others want to put the effort > in, they're free to do so. > -jg > > On Fri, Jan 14, 2011 at 9:32 AM, Nigel Daley <[EMAIL PROTECTED]> wrote: >> Yup, I'll say it again. The process ain't perfect but it's good enough IMO. Thank you Yahoo! for your contribution. >> >> Clearly these patch will need review before commit when going into trunk. >> >> Let's move on to 0.22. >> >> Nige >> >> On Jan 14, 2011, at 9:20 AM, Konstantin Boudnik wrote: >> >>> I tend to second most of Ian's points here. >>> >>> On Fri, Jan 14, 2011 at 06:14, Ian Holsman <[EMAIL PROTECTED]> wrote: >>>> (with my Apache hat on) >>>> I'm -0.5 on doing this as one big mega-patch and not including append (as opposed to a series of smaller patches). >>> >>> #1: we are creating a precedent of a "brain-dump" here. Although, it >>> isn't the first one in the history of OSS. Infamous Apple "patch" to >>> OpenBSD is another one ;) >>> >>> #2: How to spell 'back door' any one? >>> >>> #5: "almost 10 internal releases" Arun has mentioned above might be, >>> perhaps, considered as a great quality control effort. Also, not to >>> mention virtual impossibility to create a test plan to validate a >>> giant features patch. >>> >>>> BTW, I'd like to point out a discrepancy here: >>>> >>>> On another thread discussing hadoop-0.20-append as a separate branch, most people agreed that new features shouldn't be added to 0.20, now we have a major feature and we are all gung ho for it.. >>> >>> And this ^^^ >>> >>> But, hey I guess it's totally worth it! >>> Cos >>> >>>> --Ian >>>> >>>> On Jan 14, 2011, at 2:21 AM, Arun C Murthy wrote: >>>> >>>>> >>>>> On Jan 13, 2011, at 10:59 PM, Stack wrote: >>>>> >>>>>> (Man, it was looking good there for a second when 0.20.100 was about >>>>>> security+append!) >>>>>> >>>>>> Good luck w/ the release Arun. >>>>>> >>>>> >>>>> Thanks! >>>>> >>>>>> We might be following your 0.20.100 with a 0.20.200 append. >>>>>> >>>>> >>>>> Super! >>>>> >>>>> Arun >>>> >>>> >> >>
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetDhruba Borthakur 2011-01-14, 19:24
>
> > 1) I agree this is not a good precedent. We don't support mega-patches in > general. We are doing this as part of discontinuing the "yahoo distribution > of Hadoop". We don't plan to continue doing 30 person year projects outside > apache and then merging them in!! > > I think this is a very dangerous precedent and completely unwarranted. mega-patches are bad and is totally not the Apache way to go. I think if you want to contribute it back to Apache, you should avoid the mega-patch completely. I think the various 20 append patch lines may be fine for specialized > hbase clusters, but they doesn't have the rigor behind them to bet your > business in them. > > I think you are completely off-track here and jumping to conclusions. Big business are already betting on it. HBase is becoming a big user of Hadoop (dunno whether Y! uses HBase) and I completely agree with Ian that all business have to anyway test their release themselves before using it, otherwise you could land up with data loss like the type you mentioned. thanks, dhruba
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetMilind Bhandarkar 2011-01-14, 19:59
Dhruba,
While I do not think that the releasability of a branch should be determined by the market-cap (either on nasdaq or second-market) of the contributing company, I think a well-tested release is beneficial to the community. So, I support two releases: 20.100 now, that has security. And 20.200 later that incorporates appends (depending on the 0.22+appends timeline). That way, a large percentage of community is covered in 2011. The reasons are these: 1. The proposed 20.100 is perhaps the most tested at scale, out of all 0.20 branches. In fact, among *all* hadoop releases in last 5 years. I know first hand that it causes the least disruption for users, the migration from 0.20 to 0.20.10x was the smoothest, while adding a valuable feature. 2. HBase (running on hadoop 0.20 with append) has also been scale tested at Y!, but on much less than 4000 nodes, and certainly not for varied workloads (where the bugs tend to surface). (To my knowledge, the largest HBase instance is at Y! in production.) 3. Operations folks need to get some experience with raw hadoop first for any release, before other products on top of hadoop, and then handover the installation to users. So, there is still time for HBase+0.20.100, and that can be addressed in a separate release. 4. It is not as if the community hasn't had a preview of this mega-patch already. A large portion of the sub-patches are already in cdh3bx, and many of them have already been committed one-by-one to 0.22. - Milind On Jan 14, 2011, at 11:24 AM, Dhruba Borthakur wrote: >> >> >> 1) I agree this is not a good precedent. We don't support mega-patches in >> general. We are doing this as part of discontinuing the "yahoo distribution >> of Hadoop". We don't plan to continue doing 30 person year projects outside >> apache and then merging them in!! >> >> > I think this is a very dangerous precedent and completely unwarranted. > mega-patches are bad and is totally not the Apache way to go. I think if you > want to contribute it back to Apache, you should avoid the mega-patch > completely. > > > I think the various 20 append patch lines may be fine for specialized >> hbase clusters, but they doesn't have the rigor behind them to bet your >> business in them. >> >> > I think you are completely off-track here and jumping to conclusions. Big > business are already betting on it. HBase is becoming a big user of Hadoop > (dunno whether Y! uses HBase) and I completely agree with Ian that all > business have to anyway test their release themselves before using it, > otherwise you could land up with data loss like the type you mentioned. > > thanks, > dhruba --- Milind Bhandarkar [EMAIL PROTECTED]
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetStack 2011-01-16, 22:57
On Fri, Jan 14, 2011 at 10:25 AM, Eric Baldeschwieler
<[EMAIL PROTECTED]> wrote: > 2) append is hard. It is so hard we rewrote the entire write pipeline (5 person-years work) in trunk after giving up on the codeline you are suggesting we merge in. That work is what distinguishes all post 20 releases from 20 releases in my mind. I dont trust the 20 append code line. We've been hurt badly by it. We did the rewrite only after losing a bunch of production data a bunch of times with the previous code line. I think the various 20 append patch lines may be fine for specialized hbase clusters, but they doesn't have the rigor behind them to bet your business in them. > Eric: A few comments on the above: + Append has had a bunch of work done on it since the Y! dataloss of a few years ago on an ancestor of the branch-0.20-append codebase (IIRC the issue you refer to in particular -- the 'dataloss' because partially written blocks were done up in tmp dirs, and on cluster restart, tmp data was cleared -- has been fixed in branch-0.20.append). + You may not trust 0.20-append (or its close cousin over in CDH) but a bunch of HBasers do. On the one hand, we have little choice. Until the *new* append becomes available in a stable Hadoop the HBase project has had to sustain itself (What you think?, 3-6 months before we see 0.22? HBase project can't hold its breath that long). On other hand, the branch-0.20-append work has been carried out by lads (and lasses!) who know their HDFS. Its true that it will not have been tested with Y! rigor but near-derivatives -- CDH or the FB branches -- already do HDFS-200-based append in production. St.Ack P.S. Don't get me wrong. HBase is looking forward to *new* append. We just need something to suck on meantime.
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetNigel Daley 2011-01-17, 19:58
Eric, Arun, I'd like to explicitly clarify one aspect of this branch and what you mean by 'release' -- it can have many meanings.
Are you asking to actually create an Apache release from this branch (binary & source)? Or, as I was assuming, simply commit all this code to this branch and leave it there without a formal release so others can role their own binary if they wish? Thanks, Nige On Jan 14, 2011, at 10:30 AM, Eric Baldeschwieler wrote: > Yup. Letting people who want to contribute, do so a good meme! > > A stable next release would be great. But orgs do sustaining on stable code releases for a lot of very good reasons. > > A next Hadoop 21+ of this code quality is almost a year away in my opinion. > > --- > E14 - via iPhone > > On Jan 14, 2011, at 10:05 AM, "Jakob Homan" <[EMAIL PROTECTED]> wrote: > >>> On another thread discussing hadoop-0.20-append as a separate branch, most people agreed that new features shouldn't be added to 0.20, now we have a major feature and we are all gung ho for it.. >> >> Not all are. I'm against it for the all the same reasons I was >> against 20 append. This is also being used as a wedge to get the >> append work in as .200. My position is that every iota effort of >> releasing another 20 branch is an iota not spent on getting us a >> kick-ass 22. 20 was great, and we had a lot of wonderful times >> together, but it's time to move on and see other releases. >> >> But, this is a volunteer effort, and if others want to put the effort >> in, they're free to do so. >> -jg >> >> On Fri, Jan 14, 2011 at 9:32 AM, Nigel Daley <[EMAIL PROTECTED]> wrote: >>> Yup, I'll say it again. The process ain't perfect but it's good enough IMO. Thank you Yahoo! for your contribution. >>> >>> Clearly these patch will need review before commit when going into trunk. >>> >>> Let's move on to 0.22. >>> >>> Nige >>> >>> On Jan 14, 2011, at 9:20 AM, Konstantin Boudnik wrote: >>> >>>> I tend to second most of Ian's points here. >>>> >>>> On Fri, Jan 14, 2011 at 06:14, Ian Holsman <[EMAIL PROTECTED]> wrote: >>>>> (with my Apache hat on) >>>>> I'm -0.5 on doing this as one big mega-patch and not including append (as opposed to a series of smaller patches). >>>> >>>> #1: we are creating a precedent of a "brain-dump" here. Although, it >>>> isn't the first one in the history of OSS. Infamous Apple "patch" to >>>> OpenBSD is another one ;) >>>> >>>> #2: How to spell 'back door' any one? >>>> >>>> #5: "almost 10 internal releases" Arun has mentioned above might be, >>>> perhaps, considered as a great quality control effort. Also, not to >>>> mention virtual impossibility to create a test plan to validate a >>>> giant features patch. >>>> >>>>> BTW, I'd like to point out a discrepancy here: >>>>> >>>>> On another thread discussing hadoop-0.20-append as a separate branch, most people agreed that new features shouldn't be added to 0.20, now we have a major feature and we are all gung ho for it.. >>>> >>>> And this ^^^ >>>> >>>> But, hey I guess it's totally worth it! >>>> Cos >>>> >>>>> --Ian >>>>> >>>>> On Jan 14, 2011, at 2:21 AM, Arun C Murthy wrote: >>>>> >>>>>> >>>>>> On Jan 13, 2011, at 10:59 PM, Stack wrote: >>>>>> >>>>>>> (Man, it was looking good there for a second when 0.20.100 was about >>>>>>> security+append!) >>>>>>> >>>>>>> Good luck w/ the release Arun. >>>>>>> >>>>>> >>>>>> Thanks! >>>>>> >>>>>>> We might be following your 0.20.100 with a 0.20.200 append. >>>>>>> >>>>>> >>>>>> Super! >>>>>> >>>>>> Arun >>>>> >>>>> >>> >>>
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetDoug Cutting 2011-01-17, 20:11
On 01/12/2011 11:07 PM, Arun C Murthy wrote:
> Thus, I think a jumbo patch should suffice. It will also ensure this can > done quickly so that the community can then concentrate on 0.22 and beyond. > > However, I will (manually) ensure all relevant jiras are referenced in > the CHANGES.txt and Release Notes for folks to see the contents of the > release. This is the hardest part of the exercise. Also, this ensures > that we can track these jiras for 0.22 as Eli suggested. > > Does that seem like a reasonable way forward? I'm happy to brainstorm. We would not release this until each change in it has been reviewed by the community, right? Otherwise we may end up with changes in a 0.20 release that don't get approved when they're contributed to trunk and cause trunk to regress. So I don't yet see the point of committing the mega patch since the community needs to review each individual change anyway, so we might wait until each is reviewed to commit it. That said, posting the mega patch is useful, so that folks can start to pick it apart into separate issues. Pushing your internal commits to a public github branch might also make that review process easier. Doug
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetNigel Daley 2011-01-17, 20:21
On Jan 17, 2011, at 12:11 PM, Doug Cutting wrote: > On 01/12/2011 11:07 PM, Arun C Murthy wrote: >> Thus, I think a jumbo patch should suffice. It will also ensure this can >> done quickly so that the community can then concentrate on 0.22 and beyond. >> >> However, I will (manually) ensure all relevant jiras are referenced in >> the CHANGES.txt and Release Notes for folks to see the contents of the >> release. This is the hardest part of the exercise. Also, this ensures >> that we can track these jiras for 0.22 as Eli suggested. >> >> Does that seem like a reasonable way forward? I'm happy to brainstorm. > > We would not release this until each change in it has been reviewed by the community, right? Otherwise we may end up with changes in a 0.20 release that don't get approved when they're contributed to trunk and cause trunk to regress. So I don't yet see the point of committing the mega patch since the community needs to review each individual change anyway, so we might wait until each is reviewed to commit it. Unless this is a code-only drop into a branch w/ no formal Apache release. If that's the case then I'm +1 on letting them commit in this way this one time so we can all move on to 0.22. Nige
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetEric Baldeschwieler 2011-01-17, 22:56
Hi Folks,
We are very interested in sharing what we are doing with the community. I think we can separate this into multiple stages. 1) To doug's point - Yes, absolutely, we want folks to review this. The patch is now available. Lets work together to get it formatted as folks like in subversion and reviewed. Where there are issues, let's work to resolve them. With luck folks will find this work consistent and useful. Backwards compatibility has been a goal, so with luck we will not ID regressions. As todd mentioned earlier point, a lot of this work has already been merged into CDH and all of it has been reviewed by several apache committers already. 2) This code works, it is the best hadoop we know of. If you run a business on hadoop, I think you would benefit from using it. Right now you don't have the choice of an Apache release if you are looking for a stabilized modern version of Hadoop. We would like to make apache releases based on it, source and binary, incorporating bugfixes from everyone. To do that we would of course need to follow the Apache Hadoop release process, which requires the release master to produce a release candidate and the PMC to vote on the release. Since that will require a formal future vote, no one will be surprised! 3) To nigel's point - I don't think this should distract anyone from working on 22 or other Hadoop contributions. The 22 team will have the option of incorporating this work. We think it will be a better release if they do, but that is their choice. The majority of out effort at yahoo is not going into 0.20 (this branch), we are working on future features for hadoop. This is branch is the stable code we use while we are waiting for a new release. Thanks, E14 On Jan 17, 2011, at 12:21 PM, Nigel Daley wrote: > > On Jan 17, 2011, at 12:11 PM, Doug Cutting wrote: > >> On 01/12/2011 11:07 PM, Arun C Murthy wrote: >>> Thus, I think a jumbo patch should suffice. It will also ensure this can >>> done quickly so that the community can then concentrate on 0.22 and beyond. >>> >>> However, I will (manually) ensure all relevant jiras are referenced in >>> the CHANGES.txt and Release Notes for folks to see the contents of the >>> release. This is the hardest part of the exercise. Also, this ensures >>> that we can track these jiras for 0.22 as Eli suggested. >>> >>> Does that seem like a reasonable way forward? I'm happy to brainstorm. >> >> We would not release this until each change in it has been reviewed by the community, right? Otherwise we may end up with changes in a 0.20 release that don't get approved when they're contributed to trunk and cause trunk to regress. So I don't yet see the point of committing the mega patch since the community needs to review each individual change anyway, so we might wait until each is reviewed to commit it. > > Unless this is a code-only drop into a branch w/ no formal Apache release. If that's the case then I'm +1 on letting them commit in this way this one time so we can all move on to 0.22. > > Nige >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetEric Baldeschwieler 2011-01-17, 23:29
Hi Stack,
I feel your pain. We're running a 700 node HBASE cluster containing a HUGE collections of all web pages. Both versions of append were started by engineers working at yahoo and we've put A LOT of investment into both. I really, really want to see the append issue solved for HBASE!! My point is simply that we need to separate our concerns. I would 300% support a community of folks building a 0.20 derived version of hadoop with append and we know that any new release post 0.20 will contain an append solution. This branch is more backwards facing. We are simply trying to share our last two years of 0.20 experience with the community, so that a) folks can use it if they find value in it, b) this work can be merged into future hadoop releases (that will have append). We want to share what we have tested, since we believe that the testing is a good chunk of our contribution. Thanks, E14 On Jan 16, 2011, at 2:57 PM, Stack wrote: > On Fri, Jan 14, 2011 at 10:25 AM, Eric Baldeschwieler > <[EMAIL PROTECTED]> wrote: >> 2) append is hard. It is so hard we rewrote the entire write pipeline (5 person-years work) in trunk after giving up on the codeline you are suggesting we merge in. That work is what distinguishes all post 20 releases from 20 releases in my mind. I dont trust the 20 append code line. We've been hurt badly by it. We did the rewrite only after losing a bunch of production data a bunch of times with the previous code line. I think the various 20 append patch lines may be fine for specialized hbase clusters, but they doesn't have the rigor behind them to bet your business in them. >> > > Eric: > > A few comments on the above: > > + Append has had a bunch of work done on it since the Y! dataloss of a > few years ago on an ancestor of the branch-0.20-append codebase (IIRC > the issue you refer to in particular -- the 'dataloss' because > partially written blocks were done up in tmp dirs, and on cluster > restart, tmp data was cleared -- has been fixed in > branch-0.20.append). > + You may not trust 0.20-append (or its close cousin over in CDH) but > a bunch of HBasers do. On the one hand, we have little choice. Until > the *new* append becomes available in a stable Hadoop the HBase > project has had to sustain itself (What you think?, 3-6 months before > we see 0.22? HBase project can't hold its breath that long). On > other hand, the branch-0.20-append work has been carried out by lads > (and lasses!) who know their HDFS. Its true that it will not have > been tested with Y! rigor but near-derivatives -- CDH or the FB > branches -- already do HDFS-200-based append in production. > > St.Ack > P.S. Don't get me wrong. HBase is looking forward to *new* append. > We just need something to suck on meantime.
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetDoug Cutting 2011-01-17, 23:49
On 01/17/2011 02:56 PM, Eric Baldeschwieler wrote:
> 1) To doug's point - Yes, absolutely, we want folks to review this. > The patch is now available. Lets work together to get it formatted > as folks like in subversion and reviewed. Where there are issues, > let's work to resolve them. With luck folks will find this work > consistent and useful. The question I was addressing was whether to commit the mega-patch as-is, or attempt to linearize it into a sequence of patches, one per issue addressed by the mega patch. I believe that, as-is, it is probably too big to review as a unit. I don't see that merely naming the changes in it makes it substantially easier to review. Rather, each issue probably needs to be associated with a distinct patch in order to permit independent review. > Backwards compatibility has been a goal, so > with luck we will not ID regressions. My point was that, in addition to back-compatibility with prior 0.20 releases, we must also consider the forward-compatibility of each change with 0.21, 0.22 and trunk. > As todd mentioned earlier > point, a lot of this work has already been merged into CDH and all of > it has been reviewed by several apache committers already. Right, but review must be in public, so that everyone in the community has an equal chance to be involved in the development of each change. Doug
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetChris Douglas 2011-01-18, 01:41
On Mon, Jan 17, 2011 at 12:11 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> We would not release this until each change in it has been reviewed by the > community, right? Otherwise we may end up with changes in a 0.20 release > that don't get approved when they're contributed to trunk and cause trunk to > regress. So I don't yet see the point of committing the mega patch since > the community needs to review each individual change anyway, so we might > wait until each is reviewed to commit it. I share this concern. Releasing an omnibus pile of commits in the 0.20 series will create an impossible situation for the mainline. Worse, the alternative sifts through this pile over months, as the refinements wrought by consensus require remerging and revalidating of each issue. Every subsequent issue must also be reconsidered. The product must then be deployed, tested, and its bugs fixed, just to get a release as battle-hardened as this one. Signing up for all this work when most every developer and user would rather see trunk proceed would be madness. However, the status quo is also unacceptable. Running any version of Apache Hadoop is rare, when compared to the popularity of its variants. We must find a solution to that. Hadoop is not in good shape right now, and exceptional actions to correct it should not be cast off lightly by valuing consistency over its future. To address Nigel and Doug's concerns about compatibility, we should consider a different release series. We wanted to postpone 1.0 discussions, but that would be one solution. If a secure 0.20 could be released as 1.0, then if interest in this branch persists, append could be a 1.1 release on this series,* etc. while 0.22 and its successors can be 2.0 (as a rare benefit to the project split, one could argue that "Hadoop" is the unified set, and the Common, HDFS, and MapReduce projects could continue to release on the 0.x series until we want to declare those a stable successor to 0.20). Version numbers are pretty cheap, when compared to our time and focus. * In the interim, a 0.20-append release would make all kinds of sense, and fie on the niceties of naming. > That said, posting the mega patch is useful, so that folks can start to pick > it apart into separate issues. Pushing your internal commits to a public > github branch might also make that review process easier. Pushing to github caused this problem. CDH rebased on YDH, and today Apache Hadoop is considered less stable, less tested, and less usable than either one of them. Why one would expect things to work differently this time around is not clear. I assume we all agree it's a poor outcome. Arun already volunteered to break up the commits and push individual patches to the repository, so the history is manageable. We allow CTR for branches, though it's predicated on the assumption that that work will be spread over weeks or months; development should not be batched this way. However, by adding obstacles to an unambiguously positive outcome, collaborators will be skeptical of engaging more deeply with this community. Let's focus on making forward progress, not on ensuring the requisite pain is felt. -C
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetTodd Papaioannou 2011-01-18, 02:55
That's only true if you plan to pull forward the changes wholesale into .21, .22 and beyond. And that is not what is being proposed.
If the plan is to just land an updated and more stable version of .20 that is completely backwards compatible, then this can be done within that code line without any impact to the end users. Any changes that the community wish to pull forward can be identified, isolated and reviewed per the normal process. Or they can remain in the .20.100 release for eternity, without any impact on the future. Either way, the .20 release will be more stable, performant and more useful to our users, and the community at large can focus on releasing .22, which we all believe is the right goal. ToddP From: Doug Cutting <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Date: Mon, 17 Jan 2011 15:49:51 -0800 To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Subject: Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset Backwards compatibility has been a goal, so with luck we will not ID regressions. My point was that, in addition to back-compatibility with prior 0.20 releases, we must also consider the forward-compatibility of each change with 0.21, 0.22 and trunk.
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetJeff Hammerbacher 2011-01-18, 04:30
Hey,
We had this exact same discussion about the 0.20-append branch a few weeks ago. A few organizations have tested that code at scale and feel strongly that it's stable. We decided not to release it because it does not meet the Apache guidelines for a release. The Apache process has its pros and cons; we've all accepted them, so the community moved on and focused its energy on the 0.22 release. A few weeks later, we now have another organization claiming that their 0.20-based branch is tested at scale and should be released. It's claimed that 0.20.100 will be "more stable, performant and more useful to our users"; the same can be said of the 0.20-append branch. Neither branch, however, is a bugfix release and thus does not meet the Apache guidelines for a release. That's too bad; we should work to avoid this situation again in the future, but let's not try to change the rules because we did a poor job in the past of getting our work released via Apache. As Nigel mentions, and as was done with 0.20-append, I would fully support a "a code-only drop into a branch w/ no formal Apache release". That's fully compliant with the Apache process. All of these discussions will be moot once we get 0.22 out the door and stop arguing over which organization has the most magical 0.20-based bits. I'm looking forward to seeing all of the Apache Hadoop contributors working full time on that release process once these bits are committed to the 0.20.100 branch. Thanks, Jeff On Mon, Jan 17, 2011 at 6:55 PM, Todd Papaioannou <[EMAIL PROTECTED]>wrote: > That's only true if you plan to pull forward the changes wholesale into > .21, .22 and beyond. And that is not what is being proposed. > > If the plan is to just land an updated and more stable version of .20 that > is completely backwards compatible, then this can be done within that code > line without any impact to the end users. Any changes that the community > wish to pull forward can be identified, isolated and reviewed per the normal > process. Or they can remain in the .20.100 release for eternity, without any > impact on the future. > > Either way, the .20 release will be more stable, performant and more useful > to our users, and the community at large can focus on releasing .22, which > we all believe is the right goal. > > ToddP > > From: Doug Cutting <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> > Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" < > [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> > Date: Mon, 17 Jan 2011 15:49:51 -0800 > To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" < > [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> > Subject: Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset > > > Backwards compatibility has been a goal, so > with luck we will not ID regressions. > > My point was that, in addition to back-compatibility with prior 0.20 > releases, we must also consider the forward-compatibility of each change > with 0.21, 0.22 and trunk. >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-18, 04:32
On 1/17/11 12:11 PM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:
> > We would not release this until each change in it has been reviewed by > the community, right? Otherwise we may end up with changes in a 0.20 > release that don't get approved when they're contributed to trunk and > cause trunk to regress. So I don't yet see the point of committing > the > mega patch since the community needs to review each individual change > anyway, so we might wait until each is reviewed to commit it. My take is straight-forward: Apache Hadoop hasn't had a stable, updated release in a while. As a result there is too much confusion for the user-community. There are too many releases done by too many entities and nothing is available from Apache, for a long while now. This is a situation we need to rectify, urgently! Engaging in community review of these patches will distract the developer community's attention from 0.22 and the future. Not to mention, it will take forever and keep users hanging. Yes, the mechanics are important - but not more important than the end result. IAC: a) The vast majority of these patches are already on jira, and have been for several, several months now. b) The vast majority of these patches have already been committed to trunk i.e. 0.22. Sure, some patches maybe missing from 0.22 or jira; my proposal is not ideal and - I don't think anyone is pretending it is. However, it does remedy the critical problem - a stable, updated Apache Hadoop release. We can remedy backward or forward compatibility by being clever with our release versions or names. An appeal: Let's use a bit of common sense and get the project moving forward with a release. Folks are welcome to put forward a append release and an append+security release and so forth (I've strongly supported that), not to mention 0.22 and beyond. IMHO, more than one release is definitely better than none. Let's get the ball rolling, please! thanks, Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetJeff Hammerbacher 2011-01-18, 04:40
>
> Apache Hadoop hasn't had a stable, updated release in a while. > That's what 0.22 is for? However, it does remedy the critical problem - a stable, updated Apache > Hadoop release. > Again, isn't that what 0.22 is for? > An appeal: Let's use a bit of common sense and get the project moving > forward with a release. Folks are welcome to put forward a append release > and an append+security release and so forth (I've strongly supported that), > not to mention 0.22 and beyond. IMHO, more than one release is definitely > better than none. > Yes, 0.22 has both append and security. 0.22 also has the nice feature of following the Apache release guidelines rather than relying on the patch set of an independent entity, whether it's Cloudera, Facebook, or Yahoo. > Let's get the ball rolling, please! > Agreed! Nigel has done a great job getting the ball rolling on the 0.22 release. I'm looking forward to seeing everyone burn down the blockers that have been identified. Regards, Jeff
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetChris Douglas 2011-01-18, 05:36
On Mon, Jan 17, 2011 at 8:30 PM, Jeff Hammerbacher <[EMAIL PROTECTED]> wrote:
> We had this exact same discussion about the 0.20-append branch a few weeks > ago. A few organizations have tested that code at scale and feel strongly > that it's stable. We decided not to release it because it does not meet the > Apache guidelines for a release. It does meet the guidelines. The last validation and development of the append work wasn't done openly, but that doesn't prevent it from being contributed and released. That discussion died out because no committer stepped up, rolled a release of that branch, and called a vote. If it gets a majority of votes on the PMC, it could even be released off the 0.20 branch. Individuals may have their reservations and vote accordingly, but the PMC has the authority to release 0.20-append and 0.20.100 (or whatever). > A few weeks later, we now have another organization claiming that their > 0.20-based branch is tested at scale and should be released. It's claimed > that 0.20.100 will be "more stable, performant and more useful to our > users"; the same can be said of the 0.20-append branch. Neither branch, > however, is a bugfix release and thus does not meet the Apache guidelines > for a release. That's too bad; we should work to avoid this situation again > in the future, but let's not try to change the rules because we did a poor > job in the past of getting our work released via Apache. The rules from Apache are *far* more flexible than what we've practiced. Our rigidity has contributed to the current state of the project by pushing active development behind the walls of contributing organizations. We aren't obligated to live with a fractured community, nor do some nebulous Apache guidelines prohibit us from finding a way forward. > As Nigel mentions, and as was done with 0.20-append, I would fully support a > "a code-only drop into a branch w/ no formal Apache release". That's fully > compliant with the Apache process. This helps nobody except those who would cut releases outside of Apache, which is precisely what we're trying to curtail. > All of these discussions will be moot once we get 0.22 out the door and stop > arguing over which organization has the most magical 0.20-based bits. I'm > looking forward to seeing all of the Apache Hadoop contributors working full > time on that release process once these bits are committed to the 0.20.100 > branch. +1 I'm sure I'm not alone in turning ill at the thought of working on a 0.20 branch again. But others may have different priorities, different interests, and may actually prefer the architectural decisions in 0.20 to those made since then. There's obviously enough interest in a stable branch for all the major contributors to solve those same problems independently. Opening up space in Apache for that work is what we should have done a year ago. Since we don't yet have a credible release, it still makes sense today. -C
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-18, 05:56
Bringing 'organizations' into this discussion is very disingenuous.
Doug, credit to him, was the first person to propose this release: http://www.mail-archive.com/[EMAIL PROTECTED]/msg01427.html I have supported the append-release: http://www.mail-archive.com/[EMAIL PROTECTED]/msg02584.html So, stop coloring arguments in this manner. Arun On Jan 17, 2011, at 8:30 PM, Jeff Hammerbacher wrote: > Hey, > > We had this exact same discussion about the 0.20-append branch a few > weeks > ago. A few organizations have tested that code at scale and feel > strongly > that it's stable. We decided not to release it because it does not > meet the > Apache guidelines for a release. The Apache process has its pros and > cons; > we've all accepted them, so the community moved on and focused its > energy on > the 0.22 release. > > A few weeks later, we now have another organization claiming that > their > 0.20-based branch is tested at scale and should be released. It's > claimed > that 0.20.100 will be "more stable, performant and more useful to our > users"; the same can be said of the 0.20-append branch. Neither > branch, > however, is a bugfix release and thus does not meet the Apache > guidelines > for a release. That's too bad; we should work to avoid this > situation again > in the future, but let's not try to change the rules because we did > a poor > job in the past of getting our work released via Apache. > > As Nigel mentions, and as was done with 0.20-append, I would fully > support a > "a code-only drop into a branch w/ no formal Apache release". That's > fully > compliant with the Apache process. > > All of these discussions will be moot once we get 0.22 out the door > and stop > arguing over which organization has the most magical 0.20-based > bits. I'm > looking forward to seeing all of the Apache Hadoop contributors > working full > time on that release process once these bits are committed to the > 0.20.100 > branch. > > Thanks, > Jeff > > On Mon, Jan 17, 2011 at 6:55 PM, Todd Papaioannou <toddp@yahoo- > inc.com>wrote: > >> That's only true if you plan to pull forward the changes wholesale >> into >> .21, .22 and beyond. And that is not what is being proposed. >> >> If the plan is to just land an updated and more stable version of . >> 20 that >> is completely backwards compatible, then this can be done within >> that code >> line without any impact to the end users. Any changes that the >> community >> wish to pull forward can be identified, isolated and reviewed per >> the normal >> process. Or they can remain in the .20.100 release for eternity, >> without any >> impact on the future. >> >> Either way, the .20 release will be more stable, performant and >> more useful >> to our users, and the community at large can focus on releasing . >> 22, which >> we all believe is the right goal. >> >> ToddP >> >> From: Doug Cutting <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> >> Reply-To: >> "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" < >> [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> >> Date: Mon, 17 Jan 2011 15:49:51 -0800 >> To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" < >> [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> >> Subject: Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset >> >> >> Backwards compatibility has been a goal, so >> with luck we will not ID regressions. >> >> My point was that, in addition to back-compatibility with prior 0.20 >> releases, we must also consider the forward-compatibility of each >> change >> with 0.21, 0.22 and trunk. >>
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-18, 05:57
On Jan 17, 2011, at 8:40 PM, Jeff Hammerbacher wrote: >> >> Apache Hadoop hasn't had a stable, updated release in a while. >> > > That's what 0.22 is for? Every single Hadoop release in the recent past, and I have worked on pretty much every single Hadoop release since forever, has taken at least 3-4 months to stabilize. So, we are, at a minimum looking at June, 2011, for 0.22. This could be a good intermediate release, no? This need not be the only one either, please work on 20.append or 20.append+security and release it, I fully support it. IAC, would calling this release something other than 0.20.* be ok? Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetRoy T. Fielding 2011-01-18, 08:20
I thought that this discussion would have reached some sensible
understanding of how Apache projects work by now, but it seems not. On Jan 13, 2011, at 6:12 PM, Arun C Murthy wrote: > The version I'm offering to push to the community has fixed all of them, *plus* the added benefit of several stability and performance fixes we have done since 20.104.3, almost 10 internal releases. This is a battle tested and hardened version which we have deployed on 40,000+ nodes. It is a significant upgrade on 0.20.104.3 which we never deployed. I'm pretty sure *some* users will find that valuable. ;) > > Also, I've offered to push individual patches as a background activity on a branch - that should suffice, no? Or, do you consider this a blocker? > > Again, my goal in this exercise is to get a stable, improved version of Hadoop into the hands of our users asap, and focus on 0.22 and beyond. So, you have a bunch of changes that you want to contribute. Please do so. There are several ways: a) break the changes down into a sequence of patches, create jira issues for each one (or append to the existing issue), and then provide the group with a list of the issue links so that people can quickly +1 each one. When it seems worthwhile to you, create a branch off of some prior Apache release point in svn and commit each patch to it until the branch is identical to (or, in your own opinion, better than) the source code that you have tested locally. Then RM a tarball and start a release vote. Since all of this is being done in jira and svn, others can help you do all but the first part (breaking down the big patch). or b) create a branch off of some prior Apache release point in svn and replay the internal Y! commits on that branch until the branch source code is identical to what you have tested locally. Then RM a tarball based on that branch and start a release vote. Since the history is now in svn, others could do the RM bit if you don't have time. or c) create a branch off of some prior Apache release point in svn and apply one big ugly patch to it. Then RM a tarball based on that branch and ask for a release vote. You will note that none of the above requires a discussion on this list prior to the release vote, though (a) would likely result in more +1s than (b), and (b) would likely receive more +1s than (c). Regardless, the release vote is a lazy majority decision. I do not believe that there is any rational reason to apply a single big patch. "It takes too long" is nonsense -- you have already spent far more time discussing it than would be required to do it. "It is too hard" is also nonsense -- use your version control system to extract the set of changes and just replay them (with appropriate changelog editing). "It has already been tested at Y!" is simply irrelevant -- the source code has been tested, not the order in which the patches have been applied, so all you should care about is that the final branch code is comparable to the tested source code (i.e., use diff). Nevertheless, all contributions at Apache are voluntary. Do what you have time for, now, with the understanding that others may or may not complete the task, and may or may not vote for the release. You can make a branch, apply the big patch, and stand by while the rest of the group chooses whether to just accept it as a big change. Someone else might create a parallel branch to apply the specific changes in an orderly matter, or perhaps you'll discover an easy way to do that a few days from now. Or it might just sit there and never be released. There is no need for the group to agree to a plan up front, just as there is no need for the group to approve a release just because someone did the work of RMing a tarball. Sure, it might save a lot of time if potential disagreements can be resolved before work is done, but the fact is that people tend to disagree less with actual work products than with abstract plans. After all, everyone has a plan. It is also far easier to convince people to fix their own problems if the problem is right in front of them. When the release vote happens, encourage folks to test and +1 the release. If it passes, woohoo! If not, then listen to the reasons given by the other PMC members and see if you can make enough changes to the release to get those extra +1s. In other words, collaborate. ....Roy
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-18, 09:59
Thanks for the clarifications Roy.
I considered either b) and c). As I mentioned, the reason I think b) wasn't useful in this context is that we have, in several cases, 5-6 patches per jira (bug-fix on on top of a bug-fix) and several jiras didn't make sense for trunk since the bug didn't exist in trunk etc. etc. Also, I was considering a scenario where I would squash relevant patches together to produce a minimal set of coherent patches. Then there is work to remove Yahoo! specific commits. IAC, I agree - we've spent too much time talking and too little doing actual work. Let me get the job done and folks can then weigh-in on the release at later point, folks might be willing to consider this more positively once they see the branch, the change-log etc. Of course we need to get the small number of remaining patches into trunk asap for 0.22 and beyond. Arun On Jan 18, 2011, at 12:20 AM, Roy T. Fielding wrote: > I thought that this discussion would have reached some sensible > understanding of how Apache projects work by now, but it seems not. > > On Jan 13, 2011, at 6:12 PM, Arun C Murthy wrote: >> The version I'm offering to push to the community has fixed all of >> them, *plus* the added benefit of several stability and performance >> fixes we have done since 20.104.3, almost 10 internal releases. >> This is a battle tested and hardened version which we have deployed >> on 40,000+ nodes. It is a significant upgrade on 0.20.104.3 which >> we never deployed. I'm pretty sure *some* users will find that >> valuable. ;) >> >> Also, I've offered to push individual patches as a background >> activity on a branch - that should suffice, no? Or, do you consider >> this a blocker? >> >> Again, my goal in this exercise is to get a stable, improved >> version of Hadoop into the hands of our users asap, and focus on >> 0.22 and beyond. > > So, you have a bunch of changes that you want to contribute. > Please do so. There are several ways: > > a) break the changes down into a sequence of patches, create jira > issues for each one (or append to the existing issue), and then > provide the group with a list of the issue links so that people > can quickly +1 each one. When it seems worthwhile to you, create > a branch off of some prior Apache release point in svn and commit > each patch to it until the branch is identical to (or, in your own > opinion, better than) the source code that you have tested locally. > Then RM a tarball and start a release vote. Since all of this is > being done in jira and svn, others can help you do all but the > first part (breaking down the big patch). > > or > > b) create a branch off of some prior Apache release point in svn > and replay the internal Y! commits on that branch until the branch > source code is identical to what you have tested locally. Then > RM a tarball based on that branch and start a release vote. > Since the history is now in svn, others could do the RM bit if > you don't have time. > > or > > c) create a branch off of some prior Apache release point in svn > and apply one big ugly patch to it. Then RM a tarball based > on that branch and ask for a release vote. > > You will note that none of the above requires a discussion on this > list prior to the release vote, though (a) would likely result in > more +1s than (b), and (b) would likely receive more +1s than (c). > Regardless, the release vote is a lazy majority decision. > > I do not believe that there is any rational reason to apply a > single big patch. "It takes too long" is nonsense -- you have > already spent far more time discussing it than would be required > to do it. "It is too hard" is also nonsense -- use your version > control system to extract the set of changes and just replay them > (with appropriate changelog editing). "It has already been tested > at Y!" is simply irrelevant -- the source code has been tested, not > the order in which the patches have been applied, so all you should
-
RE: [DISCUSS] Hadoop Security Release off Yahoo! patchsetSeverance, Steve 2011-01-18, 18:53
I want to thank Yahoo! for this release. At eBay we are very excited about the opportunity to test a build of Hadoop that has already been extensively field tested on large clusters. At eBay we are primarily concerned with cluster availability and throughput so having a build like this available to the community is a huge win.
Hats off to Arun, Eric and everyone at Yahoo! for releasing this. Steve -----Original Message----- From: Eric Baldeschwieler [mailto:[EMAIL PROTECTED]] Sent: Friday, January 14, 2011 10:25 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset Hi Ian, Thanks for holding off on that last .5. I've been working in a big email giving move context on this. Let me preview some issues. Our goal with this branch is two fold: 1) get the code out in a branch quickly so we an collaborate on it with the community. 2) not change the character of the code. See testing below. We're happy to compromise any other dimension, as long as we can do 1&2 above. 1) I agree this is not a good precedent. We don't support mega-patches in general. We are doing this as part of discontinuing the "yahoo distribution of Hadoop". We don't plan to continue doing 30 person year projects outside apache and then merging them in!! 2) append is hard. It is so hard we rewrote the entire write pipeline (5 person-years work) in trunk after giving up on the codeline you are suggesting we merge in. That work is what distinguishes all post 20 releases from 20 releases in my mind. I dont trust the 20 append code line. We've been hurt badly by it. We did the rewrite only after losing a bunch of production data a bunch of times with the previous code line. I think the various 20 append patch lines may be fine for specialized hbase clusters, but they doesn't have the rigor behind them to bet your business in them. 3) I think having a very stable recent codeline available for teams coming into Hadoop who want to run big business apps and contribute code back is very helpful. I've been talking to folks in other orgs and they've expressed a huge amount of interest in this work, but begged us to put it into apache, so their oversight bodies will let them use it. 4) we're happy to incorporate ideas into how to best merge the work into trunk. Let's find the most cost effective way to preserve the most devel data possible. 5) testing. Ian, I think you do us a disservice when you talk about us just testing in our environments. If you look at the history of the project, we've been the force behind every stable release of apache Hadoop. And all the non-apache Hadoop release had been tracking this patch set. We fully support the community developing independent testing capabilities. We plan to contribute to that effort. But we are the organization with far and away the best record for testing Hadoop. We are proud of thus release, we want to share it. Help us sort out how. Thanks! --- E14 - via iPhone On Jan 14, 2011, at 6:15 AM, "Ian Holsman" <[EMAIL PROTECTED]> wrote: > (with my Apache hat on) > I'm -0.5 on doing this as one big mega-patch and not including append (as opposed to a series of smaller patches). > > for the following reasons: > > 1. It encourages bad behavior. We want discussion (and development) to happen on the lists, not in some office. By allowing these large code-dumps it condones this behavior, and we will likely see it again and again. Like it or not, this is not the apache model of open source governance. > > 2. There is a risk that some code that is not in a JIRA or separate patch creeps in unwittingly. This isn't a major deal per se, but we don't really have the proper paper trail, or the documentation on what bug it fixed etc etc. > > 3. Other groups (Facebook for example) are running with their own set of patches. They currently have the luxury of examining each individual patch to decide if they want to integrate it (and test it) in their environment. We are forcing them to do the work of finding the bits they want in this huge patch.
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetAllen Wittenauer 2011-01-18, 19:10
On Jan 17, 2011, at 2:56 PM, Eric Baldeschwieler wrote:
> > Right now you don't have the choice of an Apache release if you are looking for a stabilized modern version of Hadoop. Can we ratchet down the hyperbole to at least a point where I don't want to vomit? Thanks. [For the record, some of us quite like our stable, almost-a-year Apache Hadoop 0.20.2 w/3 patches installations, thankyouverymuch. (Those patches are for portability and capacity scheduler fixes, since the one in 0.20.2 is completely useless.)]
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetEric Baldeschwieler 2011-01-18, 23:22
Sounds like you are agreeing with me in your own way Allen ;-)
We're getting > 2x better throughput and stability from the capacity scheduler in this branch. I'd love to get you feedback on that. The more nodes and users in your deployment, the more improvements you will see. --- E14 - via iPhone On Jan 18, 2011, at 11:10 AM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote: > On Jan 17, 2011, at 2:56 PM, Eric Baldeschwieler wrote: >> >> Right now you don't have the choice of an Apache release if you are looking for a stabilized modern version of Hadoop. > > > Can we ratchet down the hyperbole to at least a point where I don't want to vomit? Thanks. > > [For the record, some of us quite like our stable, almost-a-year Apache Hadoop 0.20.2 w/3 patches installations, thankyouverymuch. (Those patches are for portability and capacity scheduler fixes, since the one in 0.20.2 is completely useless.)]
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetIan Holsman 2011-01-19, 07:49
That's what scares me, and highlights one of the points I made before.
If someone wants to just use the capacity scheduling improvements, and not other parts, they will find it hard. I think Roy's suggestion of applying the commits individually to the branch from your current working branch would help with this. regards Ian On Jan 19, 2011, at 1:22 AM, Eric Baldeschwieler wrote: > Sounds like you are agreeing with me in your own way Allen ;-) > > We're getting > 2x better throughput and stability from the capacity scheduler in this branch. I'd love to get you feedback on that. > > The more nodes and users in your deployment, the more improvements you will see. > > --- > E14 - via iPhone > > On Jan 18, 2011, at 11:10 AM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote: > >> On Jan 17, 2011, at 2:56 PM, Eric Baldeschwieler wrote: >>> >>> Right now you don't have the choice of an Apache release if you are looking for a stabilized modern version of Hadoop. >> >> >> Can we ratchet down the hyperbole to at least a point where I don't want to vomit? Thanks. >> >> [For the record, some of us quite like our stable, almost-a-year Apache Hadoop 0.20.2 w/3 patches installations, thankyouverymuch. (Those patches are for portability and capacity scheduler fixes, since the one in 0.20.2 is completely useless.)]
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetScott Carey 2011-01-19, 18:09
On 1/14/11 11:24 AM, "Dhruba Borthakur" <[EMAIL PROTECTED]> wrote: >> >> >> 1) I agree this is not a good precedent. We don't support mega-patches >>in >> general. We are doing this as part of discontinuing the "yahoo >>distribution >> of Hadoop". We don't plan to continue doing 30 person year projects >>outside >> apache and then merging them in!! >> >> >I think this is a very dangerous precedent and completely unwarranted. >mega-patches are bad and is totally not the Apache way to go. I think if >you >want to contribute it back to Apache, you should avoid the mega-patch >completely. > The mega-patch is not being applied to Trunk, or even the common 0.20.x branch, so its danger is significantly mitigated. If there is still a lot of worry about the mega-patch, there is one other compromise: * Take Cloudera's linearization of Y! Patches that go from 0.20.2 to 0.20.104.3 and commit them individually. * Then take a mini-mega patch from there to the latest Y!. That shouldn't be too hard, and meets Arun's goal of not changing the character of the code so that testing is minimized/eliminated. And it incorporates some hard work on the Cloudera side that will be useful if debugging on that branch is necessary. I want to see as much work as possible on 0.22 -- there are major improvements there that all can share and get the community more unified again. One drawback of this release is it could encourage the community to squat on 0.20 even longer... But sharing all that work can be seen as a necessary step to being able let go of 0.20 and move on as well.
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetKonstantin Shvachko 2011-01-19, 18:12
On Tue, Jan 18, 2011 at 11:49 PM, Ian Holsman <[EMAIL PROTECTED]> wrote:
> I think Roy's suggestion of applying the commits individually to the branch > from your current working branch would help with this. > I am sure this is not what Roy suggested. Ian. I think the idea is simple. If you decide to donate to a non-profit organization you are free to choose the form of your donation. There is a tradeoff between its usability to the community and the effort Softwareput into making it such. The community can evaluate the quality of the donation and decide on how to consume it. Also, this is different from the 0.20-append discussion, imo, as it doesn't require additional community resources. I can see many of them are dedicated to building 0.22 as we speak. Thanks, --Konstantin
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-22, 07:26
On Jan 18, 2011, at 1:59 AM, Arun C Murthy wrote: > > IAC, I agree - we've spent too much time talking and too little doing > actual work. Let me get the job done and folks can then weigh-in on > the release at later point, folks might be willing to consider this > more positively once they see the branch, the change-log etc. > > Of course we need to get the small number of remaining patches into > trunk asap for 0.22 and beyond. FYI - I've merged changes to Common's branch-0.20-security (http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security ). Some statistics: # Total of 441 jiras are covered across Common (164), HDFS (90) and Map-Reduce (187). # 413 jiras out of the 441 are already committed to trunk. # 28 open jiras: Common (6), HDFS (1), Map-Reduce (21). # Of the 28 open jiras 7 are Patch Available. # I've ensured all commits have a jira - I had to open 3 jiras (one each in all sub-projects), they are included in the above stats. Arun
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetIan Holsman 2011-01-22, 14:22
On Jan 19, 2011, at 1:12 PM, Konstantin Shvachko wrote: > On Tue, Jan 18, 2011 at 11:49 PM, Ian Holsman <[EMAIL PROTECTED]> wrote: > >> I think Roy's suggestion of applying the commits individually to the branch >> from your current working branch would help with this. >> > > I am sure this is not what Roy suggested. Ian. I think the idea is simple. to Quote Roy: b) create a branch off of some prior Apache release point in svn and replay the internal Y! commits on that branch until the branch source code is identical to what you have tested locally. Then RM a tarball based on that branch and start a release vote. Since the history is now in svn, others could do the RM bit if you don't have time. Arun has chosen option (c), that Roy also mentioned as a valid way of doing it. > If you decide to donate to a non-profit organization you are free to choose > the form of your donation. I think you are confusing a non-profit for a dumping ground. Any organization (non-profit or for-profit) has responsibilities, and their is always a tradeoff between features and risk. Any organization can choose to not accept a donation. It comes down to give-and-take As Roy also mentioned, option (c) will be harder for others to test, and get consensus about weather it is release worthy, let alone merge into 0.22. > > Thanks, > --Konstantin
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetEric Baldeschwieler 2011-01-25, 10:05
Hi Ian,
No votes have been called for at the moment. Right now all arun's done is create a branch and ask for feedback from folks who want to try it. Most of the forward ports are already either committed or in a patch available state, as arun mentioned. We'll work through the others as individual JIRAs to allow everyone to kick the tires. That should avoid issues with 0.22. I don't anticipate votes etc, unless folks do want to try it and do provide feedback. This is what runs at yahoo at the moment. I hope people think it is worth trying. Thanks, E14 On Jan 22, 2011, at 6:22 AM, Ian Holsman wrote: > > On Jan 19, 2011, at 1:12 PM, Konstantin Shvachko wrote: > >> On Tue, Jan 18, 2011 at 11:49 PM, Ian Holsman <[EMAIL PROTECTED]> wrote: >> >>> I think Roy's suggestion of applying the commits individually to the branch >>> from your current working branch would help with this. >>> >> >> I am sure this is not what Roy suggested. Ian. I think the idea is simple. > > to Quote Roy: > b) create a branch off of some prior Apache release point in svn > and replay the internal Y! commits on that branch until the branch > source code is identical to what you have tested locally. Then > RM a tarball based on that branch and start a release vote. > Since the history is now in svn, others could do the RM bit if > you don't have time. > > > Arun has chosen option (c), that Roy also mentioned as a valid way of doing it. > >> If you decide to donate to a non-profit organization you are free to choose >> the form of your donation. > > > I think you are confusing a non-profit for a dumping ground. > Any organization (non-profit or for-profit) has responsibilities, and their is always a tradeoff between features and risk. Any organization can choose to not accept a donation. It comes down to give-and-take > > > As Roy also mentioned, option (c) will be harder for others to test, and get consensus about weather it is release worthy, let alone merge into 0.22. > > >> >> Thanks, >> --Konstantin >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetIan Holsman 2011-01-25, 14:51
Arun also mentioned that he has created a separate branch ( http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security-patches/ ) for the individual patches.. so he is doing both.. so everyone should be happy.
--I On Jan 25, 2011, at 2:05 AM, Eric Baldeschwieler wrote: > Hi Ian, > > No votes have been called for at the moment. Right now all arun's done is create a branch and ask for feedback from folks who want to try it. Most of the forward ports are already either committed or in a patch available state, as arun mentioned. We'll work through the others as individual JIRAs to allow everyone to kick the tires. That should avoid issues with 0.22. > > I don't anticipate votes etc, unless folks do want to try it and do provide feedback. This is what runs at yahoo at the moment. I hope people think it is worth trying. > > Thanks, > > E14 > > On Jan 22, 2011, at 6:22 AM, Ian Holsman wrote: > >> >> On Jan 19, 2011, at 1:12 PM, Konstantin Shvachko wrote: >> >>> On Tue, Jan 18, 2011 at 11:49 PM, Ian Holsman <[EMAIL PROTECTED]> wrote: >>> >>>> I think Roy's suggestion of applying the commits individually to the branch >>>> from your current working branch would help with this. >>>> >>> >>> I am sure this is not what Roy suggested. Ian. I think the idea is simple. >> >> to Quote Roy: >> b) create a branch off of some prior Apache release point in svn >> and replay the internal Y! commits on that branch until the branch >> source code is identical to what you have tested locally. Then >> RM a tarball based on that branch and start a release vote. >> Since the history is now in svn, others could do the RM bit if >> you don't have time. >> >> >> Arun has chosen option (c), that Roy also mentioned as a valid way of doing it. >> >>> If you decide to donate to a non-profit organization you are free to choose >>> the form of your donation. >> >> >> I think you are confusing a non-profit for a dumping ground. >> Any organization (non-profit or for-profit) has responsibilities, and their is always a tradeoff between features and risk. Any organization can choose to not accept a donation. It comes down to give-and-take >> >> >> As Roy also mentioned, option (c) will be harder for others to test, and get consensus about weather it is release worthy, let alone merge into 0.22. >> >> >>> >>> Thanks, >>> --Konstantin >> >
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchsetArun C Murthy 2011-01-27, 04:11
On Jan 25, 2011, at 2:05 AM, Eric Baldeschwieler wrote:
> > I don't anticipate votes etc, unless folks do want to try it and do > provide feedback. This is what runs at yahoo at the moment. I hope > people think it is worth trying. > I've put up the bits at http://people.apache.org/~acmurthy/hadoop-0.20.100-rc0/ for interested folks. Please do provide feedback if you try it. thanks, Arun |