Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Bigtop >> mail # dev >> [DISCUSS] stabilizing Hadoop releases wrt. downstream


Copy link to this message
-
Re: [DISCUSS] stabilizing Hadoop releases wrt. downstream
I feel this is being blown out of proportion.

Integration is high on the list of *every* release. In future, if anyone or bigtop wants to help, running integration tests on a hadoop RC and providing feedback would be very welcome. I'm pretty sure I will stop an RC if it means it breaks and Oozie or HBase or Pig or Hive and re-spin it. For e.g. see recent efforts to do a 2.0.4-alpha.

With hadoop-2.0.3-alpha we discovered 3 *bugs* - making it sound like we intentionally disregard integation issues is very harsh.

Please also see other thread where we discussed stabilizing APIS, protocols etc. for the next 'beta' release.

Arun

On Feb 26, 2013, at 5:43 PM, Roman Shaposhnik wrote:

> Hi!
>
> for the past couple of releases of Hadoop 2.X code line the issue
> of integration between Hadoop and its downstream projects has
> become quite a thorny issue. The poster child here is Oozie, where
> every release of Hadoop 2.X seems to be breaking the compatibility
> in various unpredictable ways. At times other components (such
> as HBase for example) also seem to be affected.
>
> Now, to be extremely clear -- I'm NOT talking about the *latest* version
> of Oozie working with the *latest* version of Hadoop, instead
> my observations come from running previous *stable*  releases
> of Bigtop on top of Hadoop 2.X RCs.
>
> As many of you know Apache Bigtop aims at providing a single
> platform for integration of Hadoop and Hadoop ecosystem projects.
> As such we're uniquely positioned to track compatibility between
> different Hadoop releases with regards to the downstream components
> (things like Oozie, Pig, Hive, Mahout, etc.). Every single single RC
> we've been pretty diligent at trying to provide integration-level feedback
> on the quality of the upcoming release,  but it seems that our efforts
> don't quite suffice in Hadoop 2.X stabilizing.
>
> Of course, one could argue that while Hadoop 2.X code line was
> designated 'alpha' expecting much in the way of perfect integration
> and compatibility was NOT what the Hadoop community was
> focusing on. I can appreciate that view, but what I'm interested in
> is the future of Hadoop 2.X not its past. Hence, here's my question
> to all of you as a Hadoop community at large:
>
> Do you guys think that the project have reached a point where integration
> and compatibility issues should be prioritized really high on the list
> of things that make or break each future release?
>
> The good news, is that Bigtop's charter is in big part *exactly* about
> providing you with this kind of feedback. We can easily tell you when
> Hadoop behavior, with regard to downstream components, changes
> between a previous stable release and the new RC (or even branch/trunk).
> What we can NOT do is submit patches for all the issues. We are simply
> too small a project and we need your help with that.
>
> I truly believe that we owe it to the downstream projects, and in the
> second half of this email I will try to convince you of that.
>
> We all know that integration projects are impossible to pull off
> unless there's a general consensus between all of the projects involved
> that they indeed need to work with each other. You can NOT force
> that notion, but you can always try to influence. This relationship
> goes both ways.
>
> Consider a question in front of the downstream communities
> of  whether or not to adopt Hadoop 2.X as the basis. To answer
> that question each downstream project has to be reasonably
> sure that their concerns will NOT fall on deaf ears and that
> Hadoop developers are, essentially, 'ready' for them to pick
> up Hadoop 2.X. I would argue that so far the Hadoop community
> had gone out of its way to signal that 2.X codeline is NOT
> ready for the downstream.
>
> I would argue that moving forward this is a really unfortunate
> situation that may end up undermining the long term success
> of Hadoop 2.X if we don't start addressing the problem. Think
> about it -- 90% of unit tests that run downstream on Apache

Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB