-Re: documentation on dealing with legacy Hadoop versions
Christopher 2014-01-03, 19:02
We've talked before about keeping the README pretty minimal. I think
we should put more focus on documenting Accumulo on the website and
the user manual (which should be hosted on the website).
As for the specific issue of missing dependent jars... we need better
startup scripts: an ACCUMULO_BOOTSTRAP_CLASSPATH or something that is
exposed directly in our configuration, with reasonable defaults, so
it's relatively obvious what we need to start (and, so it's more
flexible for redistributing with dependency packaging we don't
expect... like the Fedora Hadoop RPM, which depends on the
commons-configuration RPM, rather than providing its own). I'd
strongly prefer this, rather than excessive documentation for specific
third-party packaging of various dependent packages (such as Cloudera
Hadoop, Fedora Hadoop, Fedora thrift, Fedora ZooKeeper, BigTop Hadoop,
BigTop ZooKeeper, Ubuntu Hadoop, etc.).
Christopher L Tubbs II
On Fri, Jan 3, 2014 at 10:55 AM, Sean Busbey <busbey+[EMAIL PROTECTED]> wrote:
> Earlier this week we had a user in IRC that was having difficulty running
> 1.5.0 because their classpath didn't include commons-configuration.
> In one case, they just needed to fix their accumulo-site to include hadoop
> 2 paths. In the other, they were using Apache Hadoop 0.20.2, which has no
> Initially, the user thought they were running a CDH3 version. This turned
> out not to be the case, but it so happens that CDH3 also does not have
> commons-configuration provided by Hadoop.
> This interaction pointed out 2 issues, and I'd like some opinions on how to
> handle them before I file jiras and possibly patches.
> 1) We are not sufficiently warning people about the need for durable sync
> Or maybe we're just not getting across when durable sync is available.
> Hadoop versions are nonsensical for most outsiders, so I think we need to
> spell it out in docs. Waiting for users to start an instance and then look
> at a log is insufficient.
> I'm thinking we need something similar to what HBase has.
> My question is, where should I add this? the README seems like a good
> place, since it already talks about enabling durable sync. How about the
> user manual? Both?
> 2) Should we document commons-configuration similar to commons-io?
> The README already has a section about how some older versions of Hadoop
> don't have commons-io. I think the versions given need to be tightened up
> given (1) above (since right now it implicitly refers to versions people
> should not be using).
> The only Hadoop distro I know of that both has proper append support and
> does not have commons-configuration is CDH3. In addition to being a
> vendor-specific version, it is no longer supported by said vendor.
> So would it be preferable to
> 2a) add a note after the commons-io section that gives similar
> instructions for adding commons-configuration?
> 2b) file a jira that points out that users on CDH3 won't have commons
> configuration, document the work around on said ticket, close it as won'tfix
> The idea with the latter approach is that it would give searchers a chance
> to find the information and give us somewhere to point people, while not
> adding to our long-term documentation baggage. The downside is that this
> won't be as accessible to users, so it will be more painful for them (esp
> if they don't have regular internet access).
> : http://hbase.apache.org/book/configuration.html#hadoop.older.versions