Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Copy link to this message
Re: [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
Konstantin Boudnik 2012-11-21, 20:00
I like Alejandro's idea about Maven for a few of reasons:
  - bringing in a scripting environment which is known for its inter-version
    idiosyncrasies just because Windows can't handle trivial shell scripting
    looks like an overkill to me
  - relative to above, there's a chance that Python's pre-requisites used in
    Hadoop might get into a conflict with some other components in the stack.
    This will be a nightmare for the integrator projects i.e. Bigtop
  - Maven is de-facto standard for Java stacks
  - Maven has built-in scripting language (Groovy) if some plugins aren't
    sufficient for achieving whatever goals

Addressing Matt's later point about non-Mavenized Hadoop-1 line: it uses Maven
stuff suchs as deploy/install via custom ant tasks. Same approach would work
for saveVersion.sh and others, I am sure.


On Wed, Nov 21, 2012 at 11:25AM, Alejandro Abdelnur wrote:
> Hey Matt,
> We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on
> its way out with the move of docs to APT)
> Why not do a maven-plugin to do that?
> Colin already has something to simplify all the cmake calls from the builds
> using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887)
> We could do the same with protoc, thus simplifying the POMs.
> The saveVersion.sh seems like another prime candidate for a maven plugin,
> and in this case it would not require external tools.
> Does this make sense?
> Thx
> On Wed, Nov 21, 2012 at 11:15 AM, Matt Foley <[EMAIL PROTECTED]> wrote:
> > This discussion started in
> > HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924>
> > , where it was proposed to replace the build-time utility "saveVersion.sh"
> > with a python script.  This would require Python as a build-time
> > dependency.  Here's the background:
> >
> > Those of us involved in the branch-1-win port of Hadoop to Windows without
> > use of Cygwin, have faced the issue of frequent use of shell scripts
> > throughout the system, both in build time (eg, the utility
> > "saveVersion.sh"),
> > and run time (config files like "hadoop-env.sh" and the start/stop scripts
> > in "bin/*" ).  Similar usages exist throughout the Hadoop stack, in all
> > projects.
> >
> > The vast majority of these shell scripts do not do anything platform
> > specific; they can be expressed in a posix-conforming way.  Therefore, it
> > seems to us that it makes sense to start using a cross-platform scripting
> > language, such as python, in place of shell for these purposes.  For those
> > rare occasions where platform-specific functionality really is needed,
> > python also supports quite a lot of platform-specific functionality on both
> > Linux and Windows; but where that is inadequate, one could still
> > conditionally invoke a platform-specific module written in shell (for
> > Linux/*nix) or powershell or bat (for Windows).
> >
> > The primary motive for moving to a cross-platform scripting language is
> > maintainability.  The alternative would be to maintain two complete suites
> > of scripts, one for Linux and one for Windows (and perhaps others in the
> > future).  We want to avoid the need to update dual modules in two different
> > languages when functionality changes, especially given that many Linux
> > developers are not familiar with powershell or bat, and many Windows
> > developers are not familiar with shell or bash.
> >
> > Regarding the choice of python:
> >
> >    - There are already a few instances of python usage in Hadoop, such as
> >    the utility (currently broken) "relnotes.py", and massive usage of
> > python
> >    in the examples/ and contrib/ directories.
> >    - Python is also used in Bigtop build-time.
> >    - The Python language is available for free on essentially all
> >    platforms, under an Apache-compatible
> > license<http://www.apache.org/legal/resolved.html>.
> >
> >    - It is supported in Eclipse and similar IDEs.
> >    - Most importantly, it is widely accepted as a reasonably good OO