Roman Shaposhnik 2013-02-27, 01:31
Roman Shaposhnik 2013-02-27, 01:43
Arun C Murthy 2013-03-01, 18:58
Konstantin Boudnik 2013-03-05, 06:15
Robert Evans 2013-03-05, 15:18
Konstantin Boudnik 2013-03-06, 05:02
Giridharan Kesavan 2013-03-06, 06:05
Arun C Murthy 2013-03-06, 15:24
Konstantin Boudnik 2013-03-06, 18:19
Roman Shaposhnik 2013-03-08, 17:55
Matt Foley 2013-03-08, 22:16
-Re: [DISCUSS] stabilizing Hadoop releases wrt. downstream
Vinod Kumar Vavilapalli 2013-03-01, 20:23
> for the past couple of releases of Hadoop 2.X code line the issue
> of integration between Hadoop and its downstream projects has
> become quite a thorny issue. The poster child here is Oozie, where
> every release of Hadoop 2.X seems to be breaking the compatibility
> in various unpredictable ways. At times other components (such
> as HBase for example) also seem to be affected.
> Now, to be extremely clear -- I'm NOT talking about the *latest* version
> of Oozie working with the *latest* version of Hadoop, instead
> my observations come from running previous *stable* releases
> of Bigtop on top of Hadoop 2.X RCs.
As you yourself noted later, the pain is part of the 'alpha' status of the release. We are targeting one of the immediate future releases to be a beta and so these troubles are really only the short term.
> Do you guys think that the project have reached a point where integration
> and compatibility issues should be prioritized really high on the list
> of things that make or break each future release?
You should see the other discussion where we discussed about this very question of stability of our immediate future releases.
> The good news, is that Bigtop's charter is in big part *exactly* about
> providing you with this kind of feedback. We can easily tell you when
> Hadoop behavior, with regard to downstream components, changes
> between a previous stable release and the new RC (or even branch/trunk).
> What we can NOT do is submit patches for all the issues. We are simply
> too small a project and we need your help with that.
I think there is a fundamental problem with the interaction of Bigtop with the downstream projects, if nothing else, with Hadoop. We never formalized on the process, will BigTop step in after an RC is up for vote or before? As I see it, it's happening after the vote is up, so no wonder we are in this state. Shall we have a pre-notice to Bigtop so that it can step in before?
> I would argue that moving forward this is a really unfortunate
> situation that may end up undermining the long term success
> of Hadoop 2.X if we don't start addressing the problem. Think
> about it -- 90% of unit tests that run downstream on Apache
> infrastructure are still exercising Hadoop 1.X underneath.
> In fact, if you were to forcefully make, lets say, HBase's
> unit tests run on top of Hadoop 2.X quite a few of them
> are going to fail. Hadoop community is, in effect, cutting
> itself off from the biggest source of feedback -- its downstream
> users. This in turn:
> * leaves Hadoop project in a perpetual state of broken
> windows syndrome.
> * leaves Apache Hadoop 2.X releases in a state considerably
> inferior to the releases *including* Apache Hadoop done by the
> vendors. The users have no choice but to alight themselves
> with vendor offerings if they wish to utilize latest Hadoop functionality.
> The artifact that is know as Apache Hadoop 2.X stopped being
> a viable choice thus fracturing the user community and reducing
> the benefits of a commonly deployed codebase.
> * leaves downstream projects of Hadoop in a jaded state where
> they legitimately get very discouraged and frustrated and eventually
> give up thinking that -- well, we work with one release of Hadoop
> (the stable one Hadoop 1.X) and we shall wait for the Hadoop
> community to get their act together.
> It is about time Hadoop 2.X community wins back all those end users
> and downstream projects that got left behind during the alpha
> stabilization phase.
This is overblown, we've been working with various downstream projects - Hbase, Hive, Pig, Oozie to help them transition them to 2.x and I believe we've made significant progress already.
Sure enough, there are continuing pains, but these are part of the alpha status. If we are really looking forward to a stable release which can support going forward, we need to live with and dive past these short term pains. I'd rather like us swim through these now instead of support broken APIs and features in our beta, having seen this very thing happen with 1.*.
Let's fix the way the release related communication is happening across our projects so that we can all work together and make 2.X a success.