Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack


Copy link to this message
-
RE: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
Mahadevan Venkatraman 2012-11-30, 02:07
+1, +1, +1 (non-binding)

Supporting Comments:

Build-time scripts: Using a platform independent language such as python (or maven in certain cases) will greatly help in reducing build breaks and improve on build script maintainability.

Run-time scripts: Most run-time scripts are end-user visible and are scripts that are needed to be run by admin such as starting/stop Hadoop cluster (hadoop-daemons) or by developers submitting a job (hadoop.cmd). There seem to be two types of script files:
    - Scripts intended for a cluster admin or an IT admin:
        - It is desirable to use a common set of python scripts that work across all platforms. However, in a Windows enterprise environment IT admins won't like it if they have to run python scripts to start/stop a cluster. So for these, there should be a PowerShell interface wrapper that can accept the right parameters and pass it down to the python script. Hopefully, the power-shell layer can be a simple pass-thru. This way the python scripts is like any other Java code hidden behind a well-known API surface. IT Admins can't debug it or modify it easily, but this is fine since for scripts like the aforementioned there isn't a requirement that IT Admins should be able to easily be able to view/modify the underlying code.
       - For Windows specific things not supported by Python natively, such as setting ACLs, starting/stopping windows services it should be possible to re-factor the code appropriately. But a little bit of powershell/cmd for these call outs would be unavoidable.

    - Scripts intended for developers/cluster users:
      - Most of these scripts (e.g. hadoop.cmd) would be behind other API surface such as WebHDFS, ODBC, JDBC, Templeton etc. So the advantage of having a common script across platforms outweighs the use of cmd/powershell as a native windows feature. Again, it should also be possible to provide simple powershell wrappers for a windows environment.

Thanks, Mahadevan.

-----Original Message-----
From: Ivan Mitic [mailto:[EMAIL PROTECTED]]
Sent: Thursday, November 29, 2012 3:41 PM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: RE: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

+1, +1, +1 (some comments inline)

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Matt Foley
Sent: Saturday, November 24, 2012 12:13 PM
To: [EMAIL PROTECTED]
Subject: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

For discussion, please see previous thread "[PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack".

This vote consists of three separate items:

1. Contributors shall be allowed to use Python as a platform-independent scripting language for build-time tasks, and add Python as a build-time dependency.
Please vote +1, 0, -1.

2. Contributors shall be encouraged to use Maven tasks in combination with either plug-ins or Groovy scripts to do cross-platform build-time tasks, even under ant in Hadoop-1.
Please vote +1, 0, -1.

>>> I believe 1&2 in combination make a total sense. I ported a few scripts to Python, and thus far, it showed to be up to the task and satisfy the cross-platform requirements. In my option, it is also important to agree on the version, as I've run into some breaking changes in version 3+.
3. Contributors shall be allowed to use Python as a platform-independent scripting language for run-time tasks, and add Python as a run-time dependency.

>>> This is a great aspirational goal! Maintaining two sets of scripts would be a real challenge.
Please vote +1, 0, -1.

Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use Maven plug-ins or Groovy as the only means of cross-platform build-time tasks, or to simply continue using platform-dependent scripts as is being done today.

Vote closes at 12Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and until those are worked out I don't want to delay moving to cross-platform scripts for build-time tasks.

Best regards,