Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Copy link to this message
Re: [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

On Wed, Nov 21, 2012 at 01:14PM, Matt Foley wrote:
> Cos,
> Please see in-line.
> On Wed, Nov 21, 2012 at 12:00 PM, Konstantin Boudnik <[EMAIL PROTECTED]> wrote:
> > I like Alejandro's idea about Maven for a few of reasons:
> >   - bringing in a scripting environment which is known for its
> >   inter-version idiosyncrasies just because Windows can't handle trivial
> >   shell scripting looks like an overkill to me
> Excuse me?  Can we at least try not to belittle other people's platforms on
> a public Apache forum?  There's nothing trivial about implementing shell on
> Windows, as cygwin regrettably proved.

Belittle? Hardly ;) Because we all know very well why shell is so awkward to
implement on any Windows system.

> >   - relative to above, there's a chance that Python's pre-requisites used
> >   in Hadoop might get into a conflict with some other components in the
> >   stack.  This will be a nightmare for the integrator projects i.e. Bigtop
> Said Bigtop project actually uses python, does it not?

It does, Matt. The main concern I have is at some point Hadoop's Python might
all of a sudden be of a different version than the one in BigTop. And all the
hell will break lose compatibility wise. What would be the solution then?

> >   - Maven is de-facto standard for Java stacks
> >
> Sure -- except for when Ant was the de-facto standard for Java stacks.  And

Arguable. Yet beyond the point.

> let's remember what maven and ant are/were the de-facto standard for:
>  Doing builds.  Not scripting everything that needs scripting.

Arguable as well, due to the very definition of a build system.

> >   - Maven has built-in scripting language (Groovy) if some plugins aren't
> >     sufficient for achieving whatever goals
> Are you proposing Groovy as a better scripting language than Python?

I am proposing Groovy is a better language than Python. Because, in part, it
goes far beyond scripting. And doesn't have permanent runtime backward
compatibility issues. What was the last time JDK had backward compatibility

> > Addressing Matt's later point about non-Mavenized Hadoop-1 line: it uses
> > Maven
> > stuff suchs as deploy/install via custom ant tasks. Same approach would
> > work
> > for saveVersion.sh and others, I am sure.
> Current ant scripts in Hadoop seem to use maven only for artifact
> management via the maven repository.  If I'm missing something, please
> point it out.  The ant build task currently calls out to saveVersion.sh.
> Having it call out to maven, which then calls out to a plug-in and/or a
> Groovy script, doesn't sound like an improvement to me.  And it's a way

At least it it guaranteed to work everywhere. And all we need in this case is
an extra jar file that can be pulled down through the same ivy/maven
dependency mechanism.

In case of Python you'd have to make sure that you're having the right version
of the interpreter and runtime. And you will have to do it manually or have an
extra requirement expressed via a system maintenance DSL.

> different use of maven than currently in the Hadoop-1 line, not a
> continuation of established practice.

The main point of my argument expressed in a lesser than 100 words: adding
Python that is inconsistent across different Linux distros and has a history
of backward incompatibilities (2.6 vs 2.5, 3.0 vs earlier, etc.) doesn't seem
to leverage the benefit of having a somewhat easier build in Windows.

Perhaps, we can do a more format benefit analysis by just comparing the
number of Hadoop installations on MS Win vs. Unix's.


> > On Wed, Nov 21, 2012 at 11:25AM, Alejandro Abdelnur wrote:
> > > Hey Matt,
> > >
> > > We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on
> > > its way out with the move of docs to APT)
> > >
> > > Why not do a maven-plugin to do that?
> > >
> > > Colin already has something to simplify all the cmake calls from the
> > builds
> > > using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887)