|
|
-
Publishing jars for hbase compiled against hadoop 0.23.x/hadoop 2.0.x
Jonathan Hsieh 2012-05-16, 18:24
Hey Devs,
I've gotten pinged by folks working on Apache Flume, a project that depends directly upon hbase and hadoop hdfs jars about how to get the proper hbase jars that work against hadoop 1.0 and hadoop 0.23/2.0. Unfortunately, the transition from hadoop 1.0.0 to hadoop 0.23.x/2.0 requires hbase to be recompiled to run against the different hadoop version. ("compile compatible" but not "binary compatible").
Currently, we build and publish hbase jars compiled against hadoop 1.0.x.
What is the right way to publish poms/jars for those who want use an hbase jars compiled against hadoop 0.23/2.0? Is there a right way?
Jon.
-- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [EMAIL PROTECTED]
-
Re: Publishing jars for hbase compiled against hadoop 0.23.x/hadoop 2.0.x
Andrew Purtell 2012-05-16, 18:27
On Wed, May 16, 2012 at 11:24 AM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote: > I've gotten pinged by folks working on Apache Flume, a project that depends > directly upon hbase and hadoop hdfs jars about how to get the proper hbase > jars that work against hadoop 1.0 and hadoop 0.23/2.0. > Unfortunately, the transition from hadoop 1.0.0 to hadoop 0.23.x/2.0 > requires hbase to be recompiled to run against the different hadoop > version. ("compile compatible" but not "binary compatible"). > > Currently, we build and publish hbase jars compiled against hadoop 1.0.x. > > What is the right way to publish poms/jars for those who want use an hbase > jars compiled against hadoop 0.23/2.0? Is there a right way?
This requires we add a version suffix for the Hadoop version used during build?
Best regards,
- Andy
Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: Publishing jars for hbase compiled against hadoop 0.23.x/hadoop 2.0.x
Jonathan Hsieh 2012-05-16, 18:35
Andy,
Ah, ok that sounds reasonable. Some this would be similar to how the security build used to have a "-security" suffix but for hadoop2 we'd have something like a "-hadoop2" suffix instead.
Jon.
On Wed, May 16, 2012 at 11:27 AM, Andrew Purtell <[EMAIL PROTECTED]>wrote:
> On Wed, May 16, 2012 at 11:24 AM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote: > > I've gotten pinged by folks working on Apache Flume, a project that > depends > > directly upon hbase and hadoop hdfs jars about how to get the proper > hbase > > jars that work against hadoop 1.0 and hadoop 0.23/2.0. > > Unfortunately, the transition from hadoop 1.0.0 to hadoop 0.23.x/2.0 > > requires hbase to be recompiled to run against the different hadoop > > version. ("compile compatible" but not "binary compatible"). > > > > Currently, we build and publish hbase jars compiled against hadoop 1.0.x. > > > > What is the right way to publish poms/jars for those who want use an > hbase > > jars compiled against hadoop 0.23/2.0? Is there a right way? > > This requires we add a version suffix for the Hadoop version used during > build? > > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet > Hein (via Tom White) >
-- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [EMAIL PROTECTED]
-
Re: Publishing jars for hbase compiled against hadoop 0.23.x/hadoop 2.0.x
Gary Helmling 2012-05-16, 19:45
Maven's support for "classifiers" in dependencies seems to be targeted at this kind of case: http://maven.apache.org/pom.html#DependenciesI'm not sure how exactly that works with publishing artifacts though. It may just amount to appending the "classifier" as a suffix anyway. But may be worth looking at in more detail. On Wed, May 16, 2012 at 11:35 AM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote: > Andy, > > Ah, ok that sounds reasonable. Some this would be similar to how the > security build used to have a "-security" suffix but for hadoop2 we'd have > something like a "-hadoop2" suffix instead. > > Jon. > > On Wed, May 16, 2012 at 11:27 AM, Andrew Purtell <[EMAIL PROTECTED]>wrote: > >> On Wed, May 16, 2012 at 11:24 AM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote: >> > I've gotten pinged by folks working on Apache Flume, a project that >> depends >> > directly upon hbase and hadoop hdfs jars about how to get the proper >> hbase >> > jars that work against hadoop 1.0 and hadoop 0.23/2.0. >> > Unfortunately, the transition from hadoop 1.0.0 to hadoop 0.23.x/2.0 >> > requires hbase to be recompiled to run against the different hadoop >> > version. ("compile compatible" but not "binary compatible"). >> > >> > Currently, we build and publish hbase jars compiled against hadoop 1.0.x. >> > >> > What is the right way to publish poms/jars for those who want use an >> hbase >> > jars compiled against hadoop 0.23/2.0? Is there a right way? >> >> This requires we add a version suffix for the Hadoop version used during >> build? >> >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet >> Hein (via Tom White) >> > > > > -- > // Jonathan Hsieh (shay) > // Software Engineer, Cloudera > // [EMAIL PROTECTED]
-
Re: Publishing jars for hbase compiled against hadoop 0.23.x/hadoop 2.0.x
Alejandro Abdelnur 2012-05-16, 19:50
A while ago I've raise this issue in Pig This is an issue that most if not all projects (hbase, pig, sqoop, hive, oozie,...) based on Hadoop will face. It would be great if all these projects come up with a consistent way of doing this. Any idea how to tackle it? Starting the discusion all dev aliases? thx On Wed, May 16, 2012 at 12:45 PM, Gary Helmling <[EMAIL PROTECTED]> wrote: > Maven's support for "classifiers" in dependencies seems to be targeted > at this kind of case: > http://maven.apache.org/pom.html#Dependencies> > I'm not sure how exactly that works with publishing artifacts though. > It may just amount to appending the "classifier" as a suffix anyway. > But may be worth looking at in more detail. > > > On Wed, May 16, 2012 at 11:35 AM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote: >> Andy, >> >> Ah, ok that sounds reasonable. Some this would be similar to how the >> security build used to have a "-security" suffix but for hadoop2 we'd have >> something like a "-hadoop2" suffix instead. >> >> Jon. >> >> On Wed, May 16, 2012 at 11:27 AM, Andrew Purtell <[EMAIL PROTECTED]>wrote: >> >>> On Wed, May 16, 2012 at 11:24 AM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote: >>> > I've gotten pinged by folks working on Apache Flume, a project that >>> depends >>> > directly upon hbase and hadoop hdfs jars about how to get the proper >>> hbase >>> > jars that work against hadoop 1.0 and hadoop 0.23/2.0. >>> > Unfortunately, the transition from hadoop 1.0.0 to hadoop 0.23.x/2.0 >>> > requires hbase to be recompiled to run against the different hadoop >>> > version. ("compile compatible" but not "binary compatible"). >>> > >>> > Currently, we build and publish hbase jars compiled against hadoop 1.0.x. >>> > >>> > What is the right way to publish poms/jars for those who want use an >>> hbase >>> > jars compiled against hadoop 0.23/2.0? Is there a right way? >>> >>> This requires we add a version suffix for the Hadoop version used during >>> build? >>> >>> Best regards, >>> >>> - Andy >>> >>> Problems worthy of attack prove their worth by hitting back. - Piet >>> Hein (via Tom White) >>> >> >> >> >> -- >> // Jonathan Hsieh (shay) >> // Software Engineer, Cloudera >> // [EMAIL PROTECTED] -- Alejandro
-
Re: Publishing jars for hbase compiled against hadoop 0.23.x/hadoop 2.0.x
Roman Shaposhnik 2012-05-16, 20:00
+Bigtop
On Wed, May 16, 2012 at 12:50 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote: > A while ago I've raise this issue in Pig > > This is an issue that most if not all projects (hbase, pig, sqoop, > hive, oozie,...) based on Hadoop will face. > > It would be great if all these projects come up with a consistent way > of doing this. > > Any idea how to tackle it? Starting the discusion all dev aliases?
This is something we've pondered in Bigtop. Our current thinking is that while it is probably Ok to lean on the "leaf-node" (think Pig, Hive, to some extend HBase) projects to at least take Hadoop compatibility into account, the full problem is going to combinatorically explode pretty soon.
Take Hive as an example -- for that project just taking care of Hadoop is not enough, if there are incompatiblities between HBase release Hive needs to publish HxB matrix of artifacts where H is the # of incomp. Hadoop versions and B is the # of incomp. HBase versions. And that doesn't take into account the fact that Hive might be interested in publishing different artifacts to begin with (think -security artifacts in HBase). This gets pretty ugly pretty quickly.
Oh, and don't forget that somebody has to test all of the above.
Now, it seems like in Bigtop we're going to soon expose the Maven repo with all of the Maven artifacts constituting a particular Bigtop "stack". You could think of it as a transitive closure of all of the deps. built against each other. This, of course, will not tackle an issue of a random combination of components (we only support the versions of components as specified in our own BOM for each particular Bigtop release) but it will provide a pretty stable body of Maven artifacts that are KNOWN (as in tested) to be compiled against each other.
If this sounds interesting and useful for upstream projects -- I'd invite the continuation of this discussion to happen on bigtop-dev@.
Thanks, Roman.
-
Re: Publishing jars for hbase compiled against hadoop 0.23.x/hadoop 2.0.x
Andrew Purtell 2012-05-16, 21:09
On Wed, May 16, 2012 at 1:00 PM, Roman Shaposhnik <[EMAIL PROTECTED]> wrote: > Now, it seems like in Bigtop we're going to soon expose the Maven repo > with all of the Maven artifacts constituting a particular Bigtop "stack". You > could think of it as a transitive closure of all of the deps. built against > each other. This, of course, will not tackle an issue of a random combination > of components (we only support the versions of components as > specified in our own BOM for each particular Bigtop release) but it will > provide a pretty stable body of Maven artifacts that are KNOWN (as > in tested) to be compiled against each other.
I think HBase should consider having a single blessed set of dependencies and only one build for a given release, but also several Jenkins projects set up to insure that release also builds against some larger set of additional dependencies according to contributor needs, and otherwise the user is welcome to mvn -Ddependency.version=foo. A project like BigTop could separately handle a broader set of combinations according to "distribution consumer" demand, we could point potential users at that if it's an option.
Best regards,
- Andy
Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: Publishing jars for hbase compiled against hadoop 0.23.x/hadoop 2.0.x
Jesse Yates 2012-05-16, 22:22
Comments inline.
TL;DR +1 on a small number of supported versions with different classifiers that only span a limited api skew to avoid a mountain of reflection. Along with that, support for the builds via jenkins testing.
Any further dependency resoluton should be considered 'external projects' and handled via their own maven setttings.xml which can be in external repos by people who want hbase to support other versions of our dependencies (and possibly have a branch of hbase with the appropriate modifications). Any new dependency versions we want to support should be heavily vetted for ease of integration and stability.
-1 on keeping code in the POMs for things we don't directly release as that means more potential maintaince for things we (as a community) don't care that much about (ala current Avro support).
------------------- Jesse Yates @jesse_yates jyates.github.com On Wed, May 16, 2012 at 2:09 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:
> On Wed, May 16, 2012 at 1:00 PM, Roman Shaposhnik <[EMAIL PROTECTED]> wrote: > > Now, it seems like in Bigtop we're going to soon expose the Maven repo > > with all of the Maven artifacts constituting a particular Bigtop > "stack". You > > could think of it as a transitive closure of all of the deps. built > against > > each other. This, of course, will not tackle an issue of a random > combination > > of components (we only support the versions of components as > > specified in our own BOM for each particular Bigtop release) but it will > > provide a pretty stable body of Maven artifacts that are KNOWN (as > > in tested) to be compiled against each other. > > I think HBase should consider having a single blessed set of > dependencies and only one build for a given release, This would be really nice, but seems a bit unreasonable given that we are the "hadoop database" (if not in name, at least by connotation). I think limiting our support to the latest X versions (2-3?) is reasonable given consistent APIs - we should be very careful in picking which new versions we support and when. A lot of the pain with the hadoop distributions has been then wildly shifting apis, making a lot of work painful for handling different versions (distcp/backup situations come to mind here, among other things. > but also several > Jenkins projects set up to insure that release also builds against > some larger set of additional dependencies according to contributor > needs, Definitely a necessity if we support more than 1 version. Only problem here is that we then have to worry about multiple builds, which seemed to be a problem in the past. If we are going to support more than 1 version, we need to have full support for that version/permutation of options (eg. Hadoop X with Zookeeper Y) > and otherwise the user is welcome to mvn > -Ddependency.version=foo. I'd prefer not to have pieces in the code that are not being regularly tested/used. If we find we have a lot of people using a given version and willing to support it, then we should roll it in (like with other external dependencies, like the Avro stuff that we are stuck with).
The mvn command you recommend above is already quite close to what we are doing already, with just specifying the hadoop version as a profile, eg (or close enough) -Dhadoop.version=0.23
+1 on the idea of having classifiers for the different versions we actually release as proper artifacts, and should be completely reasonable to enable via profiles. I'd have to double check as to _how_ people would specify that classifier/version of hbase from the maven repo, but it seems entirely possible (my worry here is about the collison with the -tests and -sources classifiers, which are standard mvn conventions for different builds). Otherwise, with maven it is very reasonable to have people hosting profiles for versions that they want to support - generally, this means just another settings.xml file that includes another profile that people can activate on their own, when they want to build against their own version.
-
Re: Publishing jars for hbase compiled against hadoop 0.23.x/hadoop 2.0.x
Andrew Purtell 2012-05-16, 23:06
[cc bigtop-dev]
On Wed, May 16, 2012 at 3:22 PM, Jesse Yates <[EMAIL PROTECTED]> wrote: > +1 on a small number of supported versions with different classifiers that > only span a limited api skew to avoid a mountain of reflection. Along with > that, support for the builds via jenkins testing.
and
>> I think HBase should consider having a single blessed set of >> dependencies and only one build for a given release, > > This would be really nice, but seems a bit unreasonable given that we are > the "hadoop database" (if not in name, at least by connotation). I think > limiting our support to the latest X versions (2-3?) is reasonable given > consistent APIs
I was talking release mechanics not source/compilation/testing level support. Hence the suggestion for multiple Jenkins projects for the dependency versions we care about. That care could be scoped like you suggest.
I like what Bigtop espouses: carefully constructed snapshots of the world, well tested in total. Seems easier to manage then laying out various planes from increasingly higher dimensional spaces. If they get traction we can act as a responsible upstream project. As for our official release, we'd have maybe two, I'll grant you that, Hadoop 1 and Hadoop 2.
X=2 will be a challenge. It's not just the Hadoop version that could change, but the versions of all of its dependencies, SLF4J, Guava, JUnit, protobuf, etc. etc. etc.; and that could happen at any time on point releases. If we are supporting the whole series of 1.x and 2.x releases, then that could be a real pain. Guava is a good example, it was a bit painful for us to move from 9 to 11 but not so for core as far as I know.
- we should be very careful in picking which new versions > we support and when. A lot of the pain with the hadoop distributions has > been then wildly shifting apis, making a lot of work painful for handling > different versions (distcp/backup situations come to mind here, among other > things.
We also have test dependencies on interfaces that are LimitedPrivate at best. It's a source of friction.
> +1 on the idea of having classifiers for the different versions we actually > release as proper artifacts, and should be completely reasonable to enable > via profiles. I'd have to double check as to _how_ people would specify > that classifier/version of hbase from the maven repo, but it seems entirely > possible (my worry here is about the collison with the -tests and -sources > classifiers, which are standard mvn conventions for different builds). > Otherwise, with maven it is very reasonable to have people hosting profiles > for versions that they want to support - generally, this means just another > settings.xml file that includes another profile that people can activate on > their own, when they want to build against their own version.
This was a question I had, maybe you know. What happens if you want to build something like <artifact>-<version>-<classifier>-tests or -source? Would that work? Otherwise we'd have to add a suffix using property substitutions in profiles, right?
Best regards,
- Andy
Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: Publishing jars for hbase compiled against hadoop 0.23.x/hadoop 2.0.x
Alejandro Abdelnur 2012-05-16, 23:17
On Wed, May 16, 2012 at 4:06 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> +1 on the idea of having classifiers for the different versions we actually >> release as proper artifacts, and should be completely reasonable to enable >> via profiles. I'd have to double check as to _how_ people would specify >> that classifier/version of hbase from the maven repo, but it seems entirely >> possible (my worry here is about the collison with the -tests and -sources >> classifiers, which are standard mvn conventions for different builds). >> Otherwise, with maven it is very reasonable to have people hosting profiles >> for versions that they want to support - generally, this means just another >> settings.xml file that includes another profile that people can activate on >> their own, when they want to build against their own version. > > This was a question I had, maybe you know. What happens if you want to > build something like <artifact>-<version>-<classifier>-tests or > -source? Would that work? Otherwise we'd have to add a suffix using > property substitutions in profiles, right? Well, we'd have to test if using <classifier> and <type> ( http://maven.apache.org/guides/mini/guide-attached-tests.html) work as expected. Otherwise (an it may be easier/cleaner) just use 2 different versions for the hbase JARs, one for Hadoop1 and one for Hadoop2 (ie embedding h1 & h2 in the version). This may be easier and less error prone for users. Whatever we do should no be based on profiles as (AFAIK) the published POMs can not be consumed activating/deactivating profiles. And again, it would be great if all projects affected by this end up using an identical solution. thx -- Alejandro
-
Re: Publishing jars for hbase compiled against hadoop 0.23.x/hadoop 2.0.x
Konstantin Boudnik 2012-05-17, 06:25
See my comments inlined...
On Wed, May 16, 2012 at 04:06PM, Andrew Purtell wrote: > [cc bigtop-dev] > > On Wed, May 16, 2012 at 3:22 PM, Jesse Yates <[EMAIL PROTECTED]> wrote: > > ═+1 on a small number of supported versions with different classifiers that > > only span a limited api skew to avoid a mountain of reflection. Along with > > that, support for the builds via jenkins testing. > > and > > >> I think HBase should consider having a single blessed set of > >> dependencies and only one build for a given release, > > > > This would be really nice, but seems a bit unreasonable given that we are > > the "hadoop database" (if not in name, at least by connotation). I think > > limiting our support to the latest X versions (2-3?) is reasonable given > > consistent APIs > > I was talking release mechanics not source/compilation/testing level > support. Hence the suggestion for multiple Jenkins projects for the > dependency versions we care about. That care could be scoped like you > suggest. > > I like what Bigtop espouses: carefully constructed snapshots of the > world, well tested in total. Seems easier to manage then laying out > various planes from increasingly higher dimensional spaces. If they > get traction we can act as a responsible upstream project. As for our > official release, we'd have maybe two, I'll grant you that, Hadoop 1 > and Hadoop 2. > > X=2 will be a challenge. It's not just the Hadoop version that could > change, but the versions of all of its dependencies, SLF4J, Guava, > JUnit, protobuf, etc. etc. etc.; and that could happen at any time on > point releases. If we are supporting the whole series of 1.x and 2.x > releases, then that could be a real pain. Guava is a good example, it > was a bit painful for us to move from 9 to 11 but not so for core as > far as I know.
One of the by-design advantages of stack-assembly-validation automation approach (that BigTop incidentally took ;) is that it provides a relatively no-effort creation of stack updates when a single or multiple dependencies got changed. Yes, it requires certain upfront time-investment to make the first base stack definition. And from here it should be pretty much downhill.
We have applied a similar approach for the creation of X86 Solaris based stacks for Sun Microsystems' rack-mount servers and it was a hoot and saved us a tremendous amount of money back then (not that it helped Sun in the long run)
> - we should be very careful in picking which new versions > > we support and when. A lot of the pain with the hadoop distributions has > > been then wildly shifting apis, making a lot of work painful for handling > > different versions (distcp/backup situations come to mind here, among other > > things. > > We also have test dependencies on interfaces that are LimitedPrivate > at best. It's a source of friction. > > > +1 on the idea of having classifiers for the different versions we actually > > release as proper artifacts, and should be completely reasonable to enable > > via profiles. I'd have to double check as to _how_ people would specify > > that classifier/version of hbase from the maven repo, but it seems entirely > > possible (my worry here is about the collison with the -tests and -sources > > classifiers, which are standard mvn conventions for different builds). > > Otherwise, with maven it is very reasonable to have people hosting profiles > > for versions that they want to support - generally, this means just another > > settings.xml file that includes another profile that people can activate on > > their own, when they want to build against their own version. > > This was a question I had, maybe you know. What happens if you want to > build something like <artifact>-<version>-<classifier>-tests or > -source? Would that work? Otherwise we'd have to add a suffix using > property substitutions in profiles, right?
*-tests artifacts in maven are somewhat special animals and can't be dependent upon in the common sense. This actually was a reason that BigTop has chosen to make/use regular binary jar artifacts and use a name designator for their test-related nature.
With regards, Cos
|
|