Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Hadoop 2.0 Support for Accumulo 1.4 Branch


Copy link to this message
-
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
For #2, from what I've read, we should definitely bump up the dependency
on 1.5.1-SNAPSHOT to 2.1.0-beta, and, given what Ted replied with, to
2.2.0-beta for that hadoop-2 profile.

I probably stated this before, but I'd much rather see more effort in
testing Accumulo 1.5.x (and 1.6.0 as that will be feature frozen soon)
against hadoop-2 (like Mike's point about HA). I'm not sure if anyone
ever did testing of Accumulo with the hadoop-2 features -- I seem to
recall that it was more testing does Accumulo run on both hadoop 1 and 2.

If we can maintain a single artifact, that would definitely be easiest
for users, but falling back to user-built artifacts or convenience
releases isn't the end of the world.

As far as commits, I'd like to see as much separation as possible, but
it's understandable if the changes overlap and don't make sense to split
out.

On 10/14/13 12:55 PM, Sean Busbey wrote:
> Hey All,
>
> I'd like to restart the conversation from end July / start August about
> Hadoop 2 support on the 1.4 branch.
>
> Specifically, I'd like to get some requirements ironed out so I can file
> one or more jiras. I'd also like to get a plan for application.
>
> =requirements
>
> Here's the requirements I have from the last thread:
>
> 1)  Maintain existing 1.4 compatibility
>
> The only thing I see listed in the pom is Apache release 0.20.203.0. (1.4.4
> tag)[1]
>
> I don't see anything in the README[2] nor the user manual[3] on other
> versions being supported.
>
>
> 2) Gain Hadoop 2 support
>
> At the moment, I'm presuming this means Apache release 2.0.4-alpha since
> that's what 1.5.0 builds against for Hadoop 2.
>
> 3) Test for correctness on given versions, with >= 5 node cluster
>
> * Unit Tests
> * Functional Tests
> * 24hr continuous + verification
> * 24hr continuous + verification + agitation
> * 24hr random walk
> * 24hr random walk + agitation
>
> Keith mentioned running these against a CDH4 cluster, but I presume that
> since Apache Releases are our stated compatibilities it would actually be
> against whatever versions we list. Based on #1 and #2 above, I would expect
> that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha.
>
> 4) Binary packaging
> 4a) Either source produces a single binary for all accepted versions
>
> or
>
> 4b) Instructions for building from source for each versions and somehow
> flag what (if any) convenience binaries are made for the release.
>
> =application
>
> There will be many back-ported patches. Not much active development happens
> on 1.4.x now, but I presume this should still all go onto a feature branch?
>
> Is the community preference that eventually all the changes become a single
> commit (or one-per-subtask if there are multiple jiras) on the active 1.4
> development branch, or that the original patches remain broken out?
>
> For what it's worth, I'd recommend keeping them broken out. (And that's how
> the initial development against CDH4 has been done.)
>
>
> [1] http://bit.ly/1fxucMe
> [2] http://bit.ly/192zUAJ
> [3]
> http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies
>