Good discussion. +1 to shipping it as 2.0-alpha
I think there are a number of areas, some enumerated already where the 2 line is immature and is likely to change. I think we should enumerate the areas we expect change and determine which we are comfortable changing in later dot releases and which we need before we can declare the 2 line to be stable.
We should also make sure we are all on the same page in terms of rules of the road for forwards and backwards compatibility between various releases on the 2 line.
- Do 2.0 clients work on all future 2 line releases without recompilation?
- Do we know how 1.0 clients migrate to 2.0? Are there changes we can make to support that?
- What major features are coming soon that will require protocol change
-- Service directory / availability -- Need to be able to change the addresses of services
- Are the YARN APIs complete enough? I'd like to see a couple of new frameworks implemented to work out the APIs
-- MPI - A classic with new needs
-- Maybe supporting HBase as a service within YARN? This will test APIs
-- Headless Map-Reduce or Pig jobs? One should be able to run them without a fat client...
-- Job history tracking for arbitrary frameworks without new server code / framework?
What do folks think of this list? What would you add?
On Apr 20, 2012, at 11:41 AM, Todd Lipcon wrote:
> On Fri, Apr 20, 2012 at 7:28 AM, Daryn Sharp <[EMAIL PROTECTED]> wrote:
>> I believe it's premature to release a non-alpha. branch-2.0 does not
>> contain a full working implementation of host-based tokens that was
>> introduced in 1.x (yes, it was done out of order...). This is a very
>> important feature that prevents tokens from being invalidated when a host's
>> ip changes. The token implementation in 2.0 requires daemons to be
>> restarted and/or jobs resubmitted when the ip of an NN is changed (ex. an
>> upgrade). Host-based tokens prevents the need to restart all cluster that
>> access a remote cluster that is upgraded.
> I think blocking the release of a non-alpha on this feature is a bit nutty.
> Not to undermine the work, but it only affects users who run security and
> want to be able to move a NN/JT from one IP to another without killing
> currently running jobs. Only a small fraction of the user base enables
> security at all, and an even smaller fraction regularly wants to migrate a
> master to a new IP mid-job.
>> I've been actively working on the yarn side and close to completion.
>> Until that's complete, I feel we should consider 2.x an alpha so there's
>> not an omission of a major feature.
> I'd call it a minor feature.
>> On Apr 19, 2012, at 1:45 PM, Arun C Murthy wrote:
>>> Yep, makes sense - I'll roll an rc0 for 2.0 after.
>>> However, we should consider whether HDFS protocols are 'ready' for us to
>> commit to them for the foreseeable future, my sense is that it's a tad
>> early - particularly with auto-failover not complete.
>>> Thus, we have a couple of options:
>>> a) Call the first release here as *2.0.0-alpha* version (lots of ASF
>> projects do this).
>>> b) Just go with 2.0.0 and deem 2.0.x or 2.1.x as the first stable
>> release and fwd-compatible release later.
>>> Given this is a major release (unlike something obscure like
>> hadoop-0.23.0) I'm inclined to go with a) i.e. hadoop-2.0.0-alpha.
>>> On Apr 19, 2012, at 12:24 AM, Eli Collins wrote:
>>>> Hey Arun,
>>>> This vote passed a week or so ago, let's make it official?
>>>> Also, are you still planning to roll a hadoop-2.0.0-rc0 of branch-2
>>>> this week? I think we should do that soon, if you're not planning to
>>>> do this holler and I'd be happy to. There's only 1 blocker left
>>>> (http://bit.ly/I55LAd) and it's patch available, I think we should
>>>> role an rc from branch-2 when it's merged.
>>>> On Thu, Mar 29, 2012 at 4:07 PM, Arun C Murthy <[EMAIL PROTECTED]>