Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Copy link to this message
Re: [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
Chris Nauroth 2012-11-21, 21:03
I worked on some of the Python build scripting that currently resides in
branch-trunk-win.  Initially, my goal was to keep a "pure" Maven
implementation to the greatest degree possible without external scripting,
but I encountered a few problems:

1. One approach is to try to express all of the build logic with existing
Maven plugins.  This turned out to be infeasible in some cases.  I don't
know of an existing plugin that does anything like the logic in
saveVersion.sh/.py for walking the source tree and checksumming the files.
 For protoc, I saw a proposed plugin in open source, but it hadn't reached
release status yet.  For creation of the distribution tarballs, the Maven
Ant Plugin (and actually the underlying Ant tool) cannot preserve file
permissions or symlinks.

2. Considering that the first approach isn't possible, another possibility
is to write custom Maven plugins.  This would require significantly more
engineering time to write and test the code.  I think there are some
legitimate concerns too about supportability, because this approach would
put significant build logic into Maven plugin code instead of something
more easily visible to release engineers, like pom.xml and external
scripts.  Also, I'm actually not sure that we can implement everything with
a Maven plugin.  For example, I mentioned the problem of preserving file
permissions and symlinks in the distribution tarballs.  Ant hasn't been
able to fix that problem due to a Java limitation, so our Maven plugins
coded in Java (or another JVM language) likely would suffer the same fate.
 We might be stuck with some amount of external scripting no matter what.

Thank you,
On Wed, Nov 21, 2012 at 12:00 PM, Konstantin Boudnik <[EMAIL PROTECTED]> wrote:

> I like Alejandro's idea about Maven for a few of reasons:
>   - bringing in a scripting environment which is known for its
> inter-version
>     idiosyncrasies just because Windows can't handle trivial shell
> scripting
>     looks like an overkill to me
>   - relative to above, there's a chance that Python's pre-requisites used
> in
>     Hadoop might get into a conflict with some other components in the
> stack.
>     This will be a nightmare for the integrator projects i.e. Bigtop
>   - Maven is de-facto standard for Java stacks
>   - Maven has built-in scripting language (Groovy) if some plugins aren't
>     sufficient for achieving whatever goals
> Addressing Matt's later point about non-Mavenized Hadoop-1 line: it uses
> Maven
> stuff suchs as deploy/install via custom ant tasks. Same approach would
> work
> for saveVersion.sh and others, I am sure.
> Cos
> On Wed, Nov 21, 2012 at 11:25AM, Alejandro Abdelnur wrote:
> > Hey Matt,
> >
> > We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on
> > its way out with the move of docs to APT)
> >
> > Why not do a maven-plugin to do that?
> >
> > Colin already has something to simplify all the cmake calls from the
> builds
> > using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887)
> >
> > We could do the same with protoc, thus simplifying the POMs.
> >
> > The saveVersion.sh seems like another prime candidate for a maven plugin,
> > and in this case it would not require external tools.
> >
> > Does this make sense?
> >
> > Thx
> >
> > On Wed, Nov 21, 2012 at 11:15 AM, Matt Foley <[EMAIL PROTECTED]> wrote:
> >
> > > This discussion started in
> > > HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924>
> > > , where it was proposed to replace the build-time utility
> "saveVersion.sh"
> > > with a python script.  This would require Python as a build-time
> > > dependency.  Here's the background:
> > >
> > > Those of us involved in the branch-1-win port of Hadoop to Windows
> without
> > > use of Cygwin, have faced the issue of frequent use of shell scripts
> > > throughout the system, both in build time (eg, the utility
> > > "saveVersion.sh"),
> > > and run time (config files like "hadoop-env.sh" and the start/stop