Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack


Copy link to this message
-
Re: [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
I like Alejandro's idea about Maven for a few of reasons:
  - bringing in a scripting environment which is known for its inter-version
    idiosyncrasies just because Windows can't handle trivial shell scripting
    looks like an overkill to me
  - relative to above, there's a chance that Python's pre-requisites used in
    Hadoop might get into a conflict with some other components in the stack.
    This will be a nightmare for the integrator projects i.e. Bigtop
  - Maven is de-facto standard for Java stacks
  - Maven has built-in scripting language (Groovy) if some plugins aren't
    sufficient for achieving whatever goals

Addressing Matt's later point about non-Mavenized Hadoop-1 line: it uses Maven
stuff suchs as deploy/install via custom ant tasks. Same approach would work
for saveVersion.sh and others, I am sure.

Cos

On Wed, Nov 21, 2012 at 11:25AM, Alejandro Abdelnur wrote:
> Hey Matt,
>
> We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on
> its way out with the move of docs to APT)
>
> Why not do a maven-plugin to do that?
>
> Colin already has something to simplify all the cmake calls from the builds
> using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887)
>
> We could do the same with protoc, thus simplifying the POMs.
>
> The saveVersion.sh seems like another prime candidate for a maven plugin,
> and in this case it would not require external tools.
>
> Does this make sense?
>
> Thx
>
> On Wed, Nov 21, 2012 at 11:15 AM, Matt Foley <[EMAIL PROTECTED]> wrote:
>
> > This discussion started in
> > HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924>
> > , where it was proposed to replace the build-time utility "saveVersion.sh"
> > with a python script.  This would require Python as a build-time
> > dependency.  Here's the background:
> >
> > Those of us involved in the branch-1-win port of Hadoop to Windows without
> > use of Cygwin, have faced the issue of frequent use of shell scripts
> > throughout the system, both in build time (eg, the utility
> > "saveVersion.sh"),
> > and run time (config files like "hadoop-env.sh" and the start/stop scripts
> > in "bin/*" ).  Similar usages exist throughout the Hadoop stack, in all
> > projects.
> >
> > The vast majority of these shell scripts do not do anything platform
> > specific; they can be expressed in a posix-conforming way.  Therefore, it
> > seems to us that it makes sense to start using a cross-platform scripting
> > language, such as python, in place of shell for these purposes.  For those
> > rare occasions where platform-specific functionality really is needed,
> > python also supports quite a lot of platform-specific functionality on both
> > Linux and Windows; but where that is inadequate, one could still
> > conditionally invoke a platform-specific module written in shell (for
> > Linux/*nix) or powershell or bat (for Windows).
> >
> > The primary motive for moving to a cross-platform scripting language is
> > maintainability.  The alternative would be to maintain two complete suites
> > of scripts, one for Linux and one for Windows (and perhaps others in the
> > future).  We want to avoid the need to update dual modules in two different
> > languages when functionality changes, especially given that many Linux
> > developers are not familiar with powershell or bat, and many Windows
> > developers are not familiar with shell or bash.
> >
> > Regarding the choice of python:
> >
> >    - There are already a few instances of python usage in Hadoop, such as
> >    the utility (currently broken) "relnotes.py", and massive usage of
> > python
> >    in the examples/ and contrib/ directories.
> >    - Python is also used in Bigtop build-time.
> >    - The Python language is available for free on essentially all
> >    platforms, under an Apache-compatible
> > license<http://www.apache.org/legal/resolved.html>.
> >
> >    - It is supported in Eclipse and similar IDEs.
> >    - Most importantly, it is widely accepted as a reasonably good OO
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB