Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # dev >> MiniCluster and "provided" scope dependencies

Josh Elser 2013-09-24, 15:57
Keith Turner 2013-09-24, 16:31
Josh Elser 2013-09-24, 16:48
Keith Turner 2013-09-24, 16:58
Josh Elser 2013-09-24, 17:20
Steve Loughran 2013-09-24, 17:55
Josh Elser 2013-09-24, 18:14
Eric Newton 2013-09-24, 18:20
Copy link to this message
Re: MiniCluster and "provided" scope dependencies
I don't think we should do that. Artifacts shouldn't be deployed
multiple times with different POMs for different dependencies. (I'm
100% positive we'd get a scolding from Benson for that.)

The point of MAC is to test Accumulo, not Hadoop, and the additional
classifiers adds a lot of complexity to the build. I think some of
this could be improved via the accumulo-maven-plugin. You can
manipulate plugin dependencies easily enough in Maven right now, and
it would be trivial for users to override the a-m-p dependency on
hadoop-client. (http://blog.sonatype.com/people/2008/04/how-to-override-a-plugins-dependency-in-maven/)

Christopher L Tubbs II
On Tue, Sep 24, 2013 at 1:20 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
> Oh, I see your point now. For hadoop 1 over hadoop 2 we would just use
> the same profiles that we have in place. We could look into using a
> classifier when deploying these artifacts so users can pull down a
> version of minicluster that is compatible with hadoop2 without forcing
> them to build it themselves.
> Given that we already *have* hadoop-1.x listed as the default
> dependency, I don't really see that as being an issue.
> On Tue, Sep 24, 2013 at 12:58 PM, Keith Turner <[EMAIL PROTECTED]> wrote:
>> On Tue, Sep 24, 2013 at 12:48 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
>>> On Tue, Sep 24, 2013 at 12:31 PM, Keith Turner <[EMAIL PROTECTED]> wrote:
>>> > On Tue, Sep 24, 2013 at 11:57 AM, Josh Elser <[EMAIL PROTECTED]>
>>> wrote:
>>> >
>>> >> I'm curious to hear what people think on this.
>>> >>
>>> >> I'm a really big fan of spinning up a minicluster instance to do some
>>> >> "more real" testing of software as I write it.
>>> >>
>>> >> With 1.5.0, it's a bit more painful because I have to add a bunch more
>>> >> dependencies to my project (which previously would only have to depend
>>> >> on the accumulo-minicluster artifact). The list includes, but is
>>> >> likely not limited to, commons-io, commons-configuration,
>>> >> hadoop-client, zookeeper, log4j, slf4j-api, slf4j-log4j12.
>>> >>
>>> >> Best as I understand it, the intent of this was that Hadoop will
>>> >> typically provide these artifacts at runtime, and therefore Accumulo
>>> >> doesn't need to re-bundle them itself which I'd agree with (not
>>> >> getting into that whole issue about the Hadoop "ecosystem"). However,
>>> >> I would think that the minicluster should have non-provided scope
>>> >> dependencies declared on these, as there is no Hadoop installation --
>>> >>
>>> >
>>> > Would this require declaring dependencies on a particular version of
>>> hadoop
>>> > in the minicluster pom?  Or could the minicluster pom have profiles for
>>> > different hadoop versions?  I do not know enough about maven to know if
>>> you
>>> > can use profiles declared in a dependency (e.g. if a user depends on
>>> > minicluster, can they activate profiles in it?)
>>> The actual dependency in minicluster is against Apache Hadoop but
>>> that's besides the point.
>>> By marking the hadoop-client dependency as provided that means that
>>> Hadoop's dependencies are *not* included at runtime (because hadoop is
>>> provided, and, as such, so are its dependencies). In other words, this
>>> is completely beside the point of what's actually included in a
>>> distribution of Hadoop when you download and install it.
>>> Apache Hadoop has dependencies we need to run minicluster. By marking
>>> the hadoop-client artifact as 'provided', we do not get its
>>> dependencies and the minicluster fails to run. I think this is easy
>>> enough to work around by overriding the dependencies we need to run
>>> the minicluster in the minicluster module (e.g. make the hadoop-client
>>> not 'provided' in the minicluster module). Thus, as we add more things
>> So if we mark hadoop-client as not provided, then we have to choose a
>> version?  How easy will it be for a user to choose a different version of
>> hadoop for their testing?  I am trying to undertand what impact this would
Corey Nolet 2013-09-24, 16:55
Christopher 2013-09-24, 18:20
Josh Elser 2013-09-24, 18:25