|
Todd Lipcon
2010-01-05, 19:29
Allen Wittenauer
2010-01-05, 20:06
Owen O'Malley
2010-01-05, 22:06
Sanjay Radia
2010-01-06, 00:39
Doug Cutting
2010-01-06, 16:10
Jean-Daniel Cryans
2010-01-06, 18:23
Owen O'Malley
2010-01-06, 18:26
Dhruba Borthakur
2010-01-06, 18:31
Owen O'Malley
2010-01-06, 18:33
Todd Lipcon
2010-01-06, 18:54
Allen Wittenauer
2010-01-06, 20:00
Sanjay Radia
2010-01-06, 21:36
|
-
0.20.2 HDFS incompatible with 0.20.1Todd Lipcon 2010-01-05, 19:29
Hey all,
In a recent discussion, we noticed that the 0.20.2 HDFS client will not be wire-compatible with 0.20.0 or 0.20.1 due to the inclusion of HDFS-793 (required for HDFS-101). This begs a few questions: 1) Although we certainly do not guarantee wire compatibility between minor versions (0.20 -> 0.21) have we previously implied wire compatibility between bugfix releases? 2) Is the above something we *should* be guaranteeing already? 3) If we haven't guaranteed the above, how many users think we have? (ie how do we correctly call out this fact in the 0.20.2 release notes in such a way that no one gets surprised). I can imagine plenty of organizations where a lockstep upgrade between client and server is difficult, and we should make sure that cluster operators know it will be necessary. Since it wasn't necessary between 0.20.0 and 0.20.1, or various 0.18 releases, people may have grown used to non-lockstep upgrades. 4) If the above are problems, would it be worth considering a patch for branch-20 that provides a client that is compatible with either, based on the datanode protocol version number of the server? It seems like a bit of scary complexity, but wanted to throw it out there. Thanks -Todd
-
Re: 0.20.2 HDFS incompatible with 0.20.1Allen Wittenauer 2010-01-05, 20:06
On 1/5/10 11:29 AM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote:
> 1) Although we certainly do not guarantee wire compatibility between minor > versions (0.20 -> 0.21) have we previously implied wire compatibility > between bugfix releases? IIRC, it has been implied and was a goal but not officially written anywhere public that I know of. > 2) Is the above something we *should* be guaranteeing already? A) From an ops perspective, the lack of compatibility between even minor releases is a pain. B) Most folks with even slightly complex environments are likely patching Hadoop. A good chunk of those are likely in ways that breaks compatibility. [For example, we're working on a TCP buffer patch for HDFS to fix what we suspect is a latency problem. Does it break compat? Maybe.] > 3) If we haven't guaranteed the above, how many users think we have? (ie how > do we correctly call out this fact in the 0.20.2 release notes in such a way > that no one gets surprised). I suspect most folks don't even know that micros are incompatible until they suddenly realize that distcp doesn't work. > I can imagine plenty of organizations where a > lockstep upgrade between client and server is difficult, and we should make > sure that cluster operators know it will be necessary. Since it wasn't > necessary between 0.20.0 and 0.20.1, or various 0.18 releases, people may > have grown used to non-lockstep upgrades. I can easily see many organizations only upgrading when it breaks due to the Hadoop binaries being spread far and wide and under the control of many different departments without any sort of centralized management. Despite the development model, I doubt few ever do just a mapred or hdfs upgrade, so any change in the stack will likely trigger a full Hadoop upgrade. > 4) If the above are problems, would it be worth considering a patch for > branch-20 that provides a client that is compatible with either, based on > the datanode protocol version number of the server? It seems like a bit of > scary complexity, but wanted to throw it out there. Everyone knows I don't mind playing devil's advocate :), so let me ask the obvious question: Bugs are bad, etc, etc, but is it so critical that it has to be in the 0.20 branch at all? I'd rather see the community spend cycles on 0.21 than worrying about 0.20 given that we're fast approaching the 1yr birthday of 0.20.0....
-
Re: 0.20.2 HDFS incompatible with 0.20.1Owen O'Malley 2010-01-05, 22:06
On Jan 5, 2010, at 11:29 AM, Todd Lipcon wrote: > 1) Although we certainly do not guarantee wire compatibility between > minor > versions (0.20 -> 0.21) have we previously implied wire compatibility > between bugfix releases? Correction. Pre-1.0, the 0.N to 0.N+1 is a major upgrade. After 1.0, 1.N to 1.N+1 is a minor. In both cases, X.Y.z to X.Y.z+1 is a patch release. I thought we had it documented somewhere, but can't find it. There is some discussion of compatibility in HADOOP-5071 that should be pulled out into a wiki page. The standing rules are that you don't incompatibly break APIs or wire protocols in patch release. So, this patch violates the rule and should have had a vote called before it was applied to branch-0.20. (And arguably branch-0.21, although since it hasn't been released, it isn't nearly the same level or problem. > 2) Is the above something we *should* be guaranteeing already? Patch releases: 1. Must be backwards API compatible without a client recompile. 2. Must be on the wire compatible. Exceptions require a vote of the committers. We should also put a notice of any exceptions at the top of the release notes. > 4) If the above are problems, would it be worth considering a patch > for > branch-20 that provides a client that is compatible with either, > based on > the datanode protocol version number of the server? It seems like a > bit of > scary complexity, but wanted to throw it out there. I would like Hairong to consider if she could fix the issue in 0.20 without the incompatible change. If it can not be done (or no one wants to do the work), we should vote whether the change should be made. -- Owen
-
Re: 0.20.2 HDFS incompatible with 0.20.1Sanjay Radia 2010-01-06, 00:39
On Jan 5, 2010, at 2:06 PM, Owen O'Malley wrote: > > On Jan 5, 2010, at 11:29 AM, Todd Lipcon wrote: > > > 1) Although we certainly do not guarantee wire compatibility between > > minor > > versions (0.20 -> 0.21) have we previously implied wire > compatibility > > between bugfix releases? > > Correction. Pre-1.0, the 0.N to 0.N+1 is a major upgrade. After 1.0, > 1.N to 1.N+1 is a minor. In both cases, X.Y.z to X.Y.z+1 is a patch > release. > > I thought we had it documented somewhere, but can't find it. There is > some discussion of compatibility in HADOOP-5071 that should be pulled > out into a wiki page. > Hadoop-5071 documents the *proposed* post 1.0 rules quite well. > > The standing rules are that you don't incompatibly break APIs or wire > protocols in patch release. So, this patch violates the rule and > should have had a vote called before it was applied to branch-0.20. > (And arguably branch-0.21, although since it hasn't been released, it > isn't nearly the same level or problem. > > > 2) Is the above something we *should* be guaranteeing already? > > Patch releases: > 1. Must be backwards API compatible without a client recompile. > 2. Must be on the wire compatible. > > Exceptions require a vote of the committers. We should also put a > notice of any exceptions at the top of the release notes. > This is also my understanding of the current rules. For patch releases, the current pre-1.0 rules and the *proposed* post-1.0 rules in Hadoop-5071 are the same - no breakage of APIs or wire protocol in a patch release.
-
Re: 0.20.2 HDFS incompatible with 0.20.1Doug Cutting 2010-01-06, 16:10
Owen O'Malley wrote:
> Correction. Pre-1.0, the 0.N to 0.N+1 is a major upgrade. After 1.0, 1.N > to 1.N+1 is a minor. In both cases, X.Y.z to X.Y.z+1 is a patch release. > > I thought we had it documented somewhere, but can't find it. http://wiki.apache.org/hadoop/Roadmap Doug
-
Re: 0.20.2 HDFS incompatible with 0.20.1Jean-Daniel Cryans 2010-01-06, 18:23
>From the HBase point of view, we would want to include hadoop
0.20.2-dev in hbase 0.20.3 specifically for HDFS-101 (127 would also be nice since we could stop patching the jar we distribute). We also share the same rules as hadoop and we don't want to break compatibility between point releases (we try). Our best case scenario would be to have a 0.20 backward compatible 0.20.2 release with HDFS-101 included. Else, we will just stick with 0.20.1 J-D On Wed, Jan 6, 2010 at 8:10 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > Owen O'Malley wrote: >> >> Correction. Pre-1.0, the 0.N to 0.N+1 is a major upgrade. After 1.0, 1.N >> to 1.N+1 is a minor. In both cases, X.Y.z to X.Y.z+1 is a patch release. >> >> I thought we had it documented somewhere, but can't find it. > > http://wiki.apache.org/hadoop/Roadmap > > Doug >
-
Re: 0.20.2 HDFS incompatible with 0.20.1Owen O'Malley 2010-01-06, 18:26
Hairong was having difficulty getting this message through the spam
filters. -- Owen ----------------Start of the message: > I would like Hairong to consider if she could fix the issue in 0.20 > without the incompatible change. It is possible that I fix the issue in 0.20 without breaking the compatibility. But I am worried about the code stability if we take this approach. The pipeline code is such a critical and fragile part of HDFS. Currently I depend on the extensive pipeline fault injection tests created in 0.21 to build the confidence that my change works. If the change in 0.20 differs from that in 0.21 and the trunk, it becomes harder for us to ensure the pipeline code stability. I would like to propose that we pull out HDFS-793 and HDFS-101 from 0.20. Hairong
-
Re: 0.20.2 HDFS incompatible with 0.20.1Dhruba Borthakur 2010-01-06, 18:31
> I would like to propose that we pull out HDFS-793 and HDFS-101 from 0.20.
+1 to pulling it out. This code is very tricky and is dangerous to change in a minor release. thanks, dhruba
-
Re: 0.20.2 HDFS incompatible with 0.20.1Owen O'Malley 2010-01-06, 18:33
On Jan 6, 2010, at 10:31 AM, Dhruba Borthakur wrote: >> I would like to propose that we pull out HDFS-793 and HDFS-101 from >> 0.20. > > +1 to pulling it out. This code is very tricky and is dangerous to > change in > a minor release. +1 to pulling it out. -- Owen
-
Re: 0.20.2 HDFS incompatible with 0.20.1Todd Lipcon 2010-01-06, 18:54
On Wed, Jan 6, 2010 at 10:31 AM, Dhruba Borthakur <[EMAIL PROTECTED]> wrote:
> > I would like to propose that we pull out HDFS-793 and HDFS-101 from 0.20. > > +1 to pulling it out. This code is very tricky and is dangerous to change > in > a minor release. > > -0 to pulling it out - I agree that it's very tricky, but I think HDFS-101 is a pretty big bug to knowingly leave in. In my experience this has been the singular cause behind a lot of HDFS write problems when a cluster has a couple of "bad egg" nodes. Given the number of times 0.21 has been pushed back, I think the majority of users will be on 0.20 for some time to come, so I'd love it to be free of known issues like this. As for testing, I'm happy to devote time to testing an alternate version of HDFS-101 that doesn't break compatibility - I can reproduce that bug reliably. That said, Hairong's opinion should carry a lot of weight, and if she thinks it's too risky, I'm totally willing to agree. -Todd
-
Re: 0.20.2 HDFS incompatible with 0.20.1Allen Wittenauer 2010-01-06, 20:00
On 1/6/10 10:54 AM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote: > -0 to pulling it out - I agree that it's very tricky, but I think HDFS-101 > is a pretty big bug to knowingly leave in. In my experience this has been > the singular cause behind a lot of HDFS write problems when a cluster has a > couple of "bad egg" nodes. > > Given the number of times 0.21 has been pushed back, I think the majority of > users will be on 0.20 for some time to come, so I'd love it to be free of > known issues like this. Due to exactly this problem (0.21 taking 4-ev-er... I think we should hold a 1-yr old birthday party for 0.20 at the appropriate HUGs), I'd like to have a working 0.20. I'm sure I'm not the only one willing to forgo back-compat, but with a caveat: The fact that an upgrade requirement needs to be obvious. Can we change the message of "protocol violation" or whatever it is to add "Perhaps you need to upgrade your client?" or something?
-
Re: 0.20.2 HDFS incompatible with 0.20.1Sanjay Radia 2010-01-06, 21:36
>
> > ----------------Start of the message: > > I would like Hairong to consider if she could fix the issue in 0.20 > > without the incompatible change. > > It is possible that I fix the issue in 0.20 without breaking the > compatibility. But I am worried about the code stability if we take > this approach. The pipeline code is such a critical and fragile part > of HDFS. Currently I depend on the extensive pipeline fault > injection tests created in 0.21 to build the confidence that my > change works. If the change in 0.20 differs from that in 0.21 and > the trunk, it becomes harder for us to ensure the pipeline code > stability. > > I would like to propose that we pull out HDFS-793 and HDFS-101 from > 0.20. > > Hairong +1 to pull it out. This part of the code is trick to get right and test. sanjay |