Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - [DISCUSS] Hadoop Security Release off Yahoo! patchset


Copy link to this message
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset
Chris Douglas 2011-01-18, 01:41
On Mon, Jan 17, 2011 at 12:11 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> We would not release this until each change in it has been reviewed by the
> community, right?  Otherwise we may end up with changes in a 0.20 release
> that don't get approved when they're contributed to trunk and cause trunk to
> regress.  So I don't yet see the point of committing the mega patch since
> the community needs to review each individual change anyway, so we might
> wait until each is reviewed to commit it.

I share this concern. Releasing an omnibus pile of commits in the 0.20
series will create an impossible situation for the mainline. Worse,
the alternative sifts through this pile over months, as the
refinements wrought by consensus require remerging and revalidating of
each issue. Every subsequent issue must also be reconsidered. The
product must then be deployed, tested, and its bugs fixed, just to get
a release as battle-hardened as this one. Signing up for all this work
when most every developer and user would rather see trunk proceed
would be madness.

However, the status quo is also unacceptable. Running any version of
Apache Hadoop is rare, when compared to the popularity of its
variants. We must find a solution to that. Hadoop is not in good shape
right now, and exceptional actions to correct it should not be cast
off lightly by valuing consistency over its future.

To address Nigel and Doug's concerns about compatibility, we should
consider a different release series. We wanted to postpone 1.0
discussions, but that would be one solution. If a secure 0.20 could be
released as 1.0, then if interest in this branch persists, append
could be a 1.1 release on this series,* etc. while 0.22 and its
successors can be 2.0 (as a rare benefit to the project split, one
could argue that "Hadoop" is the unified set, and the Common, HDFS,
and MapReduce projects could continue to release on the 0.x series
until we want to declare those a stable successor to 0.20). Version
numbers are pretty cheap, when compared to our time and focus.

* In the interim, a 0.20-append release would make all kinds of sense,
and fie on the niceties of naming.

> That said, posting the mega patch is useful, so that folks can start to pick
> it apart into separate issues.  Pushing your internal commits to a public
> github branch might also make that review process easier.

Pushing to github caused this problem. CDH rebased on YDH, and today
Apache Hadoop is considered less stable, less tested, and less usable
than either one of them. Why one would expect things to work
differently this time around is not clear. I assume we all agree it's
a poor outcome.

Arun already volunteered to break up the commits and push individual
patches to the repository, so the history is manageable. We allow CTR
for branches, though it's predicated on the assumption that that work
will be spread over weeks or months; development should not be batched
this way. However, by adding obstacles to an unambiguously positive
outcome, collaborators will be skeptical of engaging more deeply with
this community. Let's focus on making forward progress, not on
ensuring the requisite pain is felt. -C