Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Bigtop, mail # dev - [DISCUSS] stabilizing Hadoop releases wrt. downstream

Roman Shaposhnik 2013-02-27, 01:31
Roman Shaposhnik 2013-02-27, 01:43
Arun C Murthy 2013-03-01, 18:58
Copy link to this message
Re: [DISCUSS] stabilizing Hadoop releases wrt. downstream
Konstantin Boudnik 2013-03-05, 06:15

first of all, I don't think anyone is trying to put a blame on someone
else. E.g. I had similar experience with Oozie being broken because of
certain released changes in the upstream.

I am sure that most people in BigTop community - especially those who
share the committer-ship privilege in BigTop and other upstream
projects, including Hadoop, - would be happy to help with the
stabilization of the Hadoop base. The issue that a downstream
integration project is likely to have is - for once - the absence of
regularly published development artifacts. In the light of "it didn't
happen if there's no picture" here's a couple of examples:

  - 2.0.2-SNAPSHOT weren't published at all; only release 2.0.2-alpha artifacts were
  - 2.0.3-SNAPSHOT weren't published until Feb 29, 2013 (it happened just once)

So, technically speaking, unless an integration project is willing to
build and maintain its own artifacts, it is impossible to do any
preventive validation.

Which brings me to my next question: how do you guys address
"Integration is high on the list of *every* release". Again, please
don't get me wrong - I am not looking to lay a blame on or corner
anyone - I am really curious and would appreciate the input.

> As you yourself noted later, the pain is part of the 'alpha' status
> of the release. We are targeting +one of the immediate future
> releases to be a beta and so these troubles are really only the
> short +term.

I don't really want to get into the discussion about of what
constitutes the alpha and how it has delayed the adoption of Hadoop2
line. However, I want to point out that it is especially important for
"alpha" platform to work nicely with downstream consumers of the said
platform. For quite obvious reasons, I believe.

> I think there is a fundamental problem with the interaction of
> Bigtop with the downstream projects, if nothing else, with

BigTop is as downstream as it can get, because BigTop essentially
consumes all other component releases in order to produce a viable
stack. Technicalities aside...

> Hadoop. We never formalized on the process, will BigTop step in
> after an RC is up for vote or before? As I see it, it's happening

Bigtop essentially can give any component, including Hadoop, and
better yet - the set of components - certain guaratees about
compatibility and dependencies being included. Case in point is
missing commons libraries missed in 1.0.1 release that essentially
prevented HBase from working properly.

> after the vote is up, so no wonder we are in this state. Shall we
> have a pre-notice to Bigtop so that it can step in before?

The above is in contradiction with earlier statement of "Integration
is high on the list of *every* release". If BigTop isn't used for
integration testing, then how said integration testing is performed?
Is it some sort of test-patch process as Luke referred earlier?  And
why it leaves the room for the integration issues being uncaught?
Again, I am genuinely interested to know.

> these short term pains. I'd rather like us swim through these now
> instead of support broken APIs and features in our beta, having seen
> this very thing happen with 1.*.

I think you're mixing the point of integration with downstream and
being in an alpha phase of the development. The former isn't about
supporting "broken APIs" - it is about being consistent and avoid
breaking the downstream applicaitons without letting said applications
to accomodate the platform changes first.

Changes in the API, after all, can be relatively easy traced by
integration validation - this is the whole point of integration
testing. And BigTop does the job better then anything around, simply
because there's nothing else around to do it.

If you stay in shape-shifting "alpha" that doesn't integrate well for
a very long time, you risk to lose downstream customers' interest,
because they might get tired of waiting until a next stable API will
be ready for them.

> Let's fix the way the release related communication is happening

This is a very good point indeed! Let's start a separate discussion
thread on how we can improve the release model for coming Hadoop
releases, where we - as the community - can provide better guarantees
of the inter-component compatibility (sorry for an overused word).


On Fri, Mar 01, 2013 at 10:58AM, Arun C Murthy wrote:
Robert Evans 2013-03-05, 15:18
Konstantin Boudnik 2013-03-06, 05:02
Giridharan Kesavan 2013-03-06, 06:05
Arun C Murthy 2013-03-06, 15:24
Konstantin Boudnik 2013-03-06, 18:19
Roman Shaposhnik 2013-03-08, 17:55
Matt Foley 2013-03-08, 22:16
Vinod Kumar Vavilapalli 2013-03-01, 20:23