Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack


Copy link to this message
-
Re: [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
Alejandro Abdelnur 2012-11-21, 19:25
Hey Matt,

We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on
its way out with the move of docs to APT)

Why not do a maven-plugin to do that?

Colin already has something to simplify all the cmake calls from the builds
using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887)

We could do the same with protoc, thus simplifying the POMs.

The saveVersion.sh seems like another prime candidate for a maven plugin,
and in this case it would not require external tools.

Does this make sense?

Thx

On Wed, Nov 21, 2012 at 11:15 AM, Matt Foley <[EMAIL PROTECTED]> wrote:

> This discussion started in
> HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924>
> , where it was proposed to replace the build-time utility "saveVersion.sh"
> with a python script.  This would require Python as a build-time
> dependency.  Here's the background:
>
> Those of us involved in the branch-1-win port of Hadoop to Windows without
> use of Cygwin, have faced the issue of frequent use of shell scripts
> throughout the system, both in build time (eg, the utility
> "saveVersion.sh"),
> and run time (config files like "hadoop-env.sh" and the start/stop scripts
> in "bin/*" ).  Similar usages exist throughout the Hadoop stack, in all
> projects.
>
> The vast majority of these shell scripts do not do anything platform
> specific; they can be expressed in a posix-conforming way.  Therefore, it
> seems to us that it makes sense to start using a cross-platform scripting
> language, such as python, in place of shell for these purposes.  For those
> rare occasions where platform-specific functionality really is needed,
> python also supports quite a lot of platform-specific functionality on both
> Linux and Windows; but where that is inadequate, one could still
> conditionally invoke a platform-specific module written in shell (for
> Linux/*nix) or powershell or bat (for Windows).
>
> The primary motive for moving to a cross-platform scripting language is
> maintainability.  The alternative would be to maintain two complete suites
> of scripts, one for Linux and one for Windows (and perhaps others in the
> future).  We want to avoid the need to update dual modules in two different
> languages when functionality changes, especially given that many Linux
> developers are not familiar with powershell or bat, and many Windows
> developers are not familiar with shell or bash.
>
> Regarding the choice of python:
>
>    - There are already a few instances of python usage in Hadoop, such as
>    the utility (currently broken) "relnotes.py", and massive usage of
> python
>    in the examples/ and contrib/ directories.
>    - Python is also used in Bigtop build-time.
>    - The Python language is available for free on essentially all
>    platforms, under an Apache-compatible
> license<http://www.apache.org/legal/resolved.html>.
>
>    - It is supported in Eclipse and similar IDEs.
>    - Most importantly, it is widely accepted as a reasonably good OO
>    scripting language, and it is easily learned by anyone who already knows
>    shell or perl, or other common scripting languages.
>    - On the Tiobe index of programming language
> popularity<
> http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html>,
>    which seeks to measure the relative number of software engineers who
> know
>    and use each language, Python far exceeds Perl and Ruby.  The only more
>    well-known scripting languages are PHP and Visual Basic, neither of
> which
>    seems a prime candidate for this use.
>
> For build-time usage, I think we should immediately approve python as a
> build-time dependency, and allow people who are motivated to do so, to open
> jiras for migrating existing build-time shell scripts to python.
>
> For run-time, there is likely to be a lot more discussion.  Lots of folks,
> including me, aren't real happy with use of active scripts for
> configuration, and various others, including I believe some of the Bigtop
> folks, have issues with the way the start/stop scripts work.  Nevertheless,

Alejandro