Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> MiniCluster and "provided" scope dependencies


Copy link to this message
-
Re: MiniCluster and "provided" scope dependencies
+1

I remember kind of having this discussion in June because I wanted to be
able to run the minicluster as a single node accumulo using the start
package.

I like this approach better. 1.6.0 provides a main method for firing up the
minicluster and having the dependencies in the pom will allow developers to
fire it up without needing Hadoop/Zookeeper installed.

ACCUMULO-1405 <https://issues.apache.org/jira/browse/ACCUMULO-1405>

On Tue, Sep 24, 2013 at 12:48 PM, Josh Elser <[EMAIL PROTECTED]> wrote:

> On Tue, Sep 24, 2013 at 12:31 PM, Keith Turner <[EMAIL PROTECTED]> wrote:
> > On Tue, Sep 24, 2013 at 11:57 AM, Josh Elser <[EMAIL PROTECTED]>
> wrote:
> >
> >> I'm curious to hear what people think on this.
> >>
> >> I'm a really big fan of spinning up a minicluster instance to do some
> >> "more real" testing of software as I write it.
> >>
> >> With 1.5.0, it's a bit more painful because I have to add a bunch more
> >> dependencies to my project (which previously would only have to depend
> >> on the accumulo-minicluster artifact). The list includes, but is
> >> likely not limited to, commons-io, commons-configuration,
> >> hadoop-client, zookeeper, log4j, slf4j-api, slf4j-log4j12.
> >>
> >> Best as I understand it, the intent of this was that Hadoop will
> >> typically provide these artifacts at runtime, and therefore Accumulo
> >> doesn't need to re-bundle them itself which I'd agree with (not
> >> getting into that whole issue about the Hadoop "ecosystem"). However,
> >> I would think that the minicluster should have non-provided scope
> >> dependencies declared on these, as there is no Hadoop installation --
> >>
> >
> > Would this require declaring dependencies on a particular version of
> hadoop
> > in the minicluster pom?  Or could the minicluster pom have profiles for
> > different hadoop versions?  I do not know enough about maven to know if
> you
> > can use profiles declared in a dependency (e.g. if a user depends on
> > minicluster, can they activate profiles in it?)
>
> The actual dependency in minicluster is against Apache Hadoop but
> that's besides the point.
>
> By marking the hadoop-client dependency as provided that means that
> Hadoop's dependencies are *not* included at runtime (because hadoop is
> provided, and, as such, so are its dependencies). In other words, this
> is completely beside the point of what's actually included in a
> distribution of Hadoop when you download and install it.
>
> Apache Hadoop has dependencies we need to run minicluster. By marking
> the hadoop-client artifact as 'provided', we do not get its
> dependencies and the minicluster fails to run. I think this is easy
> enough to work around by overriding the dependencies we need to run
> the minicluster in the minicluster module (e.g. make the hadoop-client
> not 'provided' in the minicluster module). Thus, as we add more things
> to the minicluster that require other libraries, we control the
> dependency mgmt instead of forcing that onto the user.
>
> >
> >
> >> there's just the minicluster. As such, this would alleviate users from
> >> having to dig into our dependency management or trial&error to figure
> >> out what "extra" dependencies they have to include in their project to
> >> actually make it work
> >>
> >> Thoughts?
> >>
> >> - Josh
> >>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB