Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # dev >> Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack


+
Matt Foley 2012-11-30, 01:51
+
Alejandro Abdelnur 2012-11-30, 01:25
+
Matt Foley 2012-11-30, 02:26
+
Chuan Liu 2012-11-30, 03:22
+
Bikas Saha 2012-11-30, 04:27
+
Luke Lu 2012-11-30, 11:21
+
Luke Lu 2012-11-30, 12:57
+
Steve Loughran 2012-11-30, 13:29
+
Luke Lu 2012-11-30, 14:02
+
Luke Lu 2012-11-30, 13:49
+
Arun C Murthy 2012-12-02, 18:20
+
Radim Kolar 2012-11-30, 00:29
+
Steve Loughran 2012-11-30, 13:20
+
Radim Kolar 2012-11-30, 13:40
+
Jitendra Pandey 2012-11-30, 22:49
+
Steve Loughran 2012-12-01, 10:48
+
Matt Foley 2012-11-24, 20:13
+
Ivan Mitic 2012-11-29, 23:41
Copy link to this message
-
RE: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
+1, +1, +1 (non-binding)

Supporting Comments:

Build-time scripts: Using a platform independent language such as python (or maven in certain cases) will greatly help in reducing build breaks and improve on build script maintainability.

Run-time scripts: Most run-time scripts are end-user visible and are scripts that are needed to be run by admin such as starting/stop Hadoop cluster (hadoop-daemons) or by developers submitting a job (hadoop.cmd). There seem to be two types of script files:
    - Scripts intended for a cluster admin or an IT admin:
        - It is desirable to use a common set of python scripts that work across all platforms. However, in a Windows enterprise environment IT admins won't like it if they have to run python scripts to start/stop a cluster. So for these, there should be a PowerShell interface wrapper that can accept the right parameters and pass it down to the python script. Hopefully, the power-shell layer can be a simple pass-thru. This way the python scripts is like any other Java code hidden behind a well-known API surface. IT Admins can't debug it or modify it easily, but this is fine since for scripts like the aforementioned there isn't a requirement that IT Admins should be able to easily be able to view/modify the underlying code.
       - For Windows specific things not supported by Python natively, such as setting ACLs, starting/stopping windows services it should be possible to re-factor the code appropriately. But a little bit of powershell/cmd for these call outs would be unavoidable.

    - Scripts intended for developers/cluster users:
      - Most of these scripts (e.g. hadoop.cmd) would be behind other API surface such as WebHDFS, ODBC, JDBC, Templeton etc. So the advantage of having a common script across platforms outweighs the use of cmd/powershell as a native windows feature. Again, it should also be possible to provide simple powershell wrappers for a windows environment.

Thanks, Mahadevan.

-----Original Message-----
From: Ivan Mitic [mailto:[EMAIL PROTECTED]]
Sent: Thursday, November 29, 2012 3:41 PM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: RE: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

+1, +1, +1 (some comments inline)

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Matt Foley
Sent: Saturday, November 24, 2012 12:13 PM
To: [EMAIL PROTECTED]
Subject: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

For discussion, please see previous thread "[PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack".

This vote consists of three separate items:

1. Contributors shall be allowed to use Python as a platform-independent scripting language for build-time tasks, and add Python as a build-time dependency.
Please vote +1, 0, -1.

2. Contributors shall be encouraged to use Maven tasks in combination with either plug-ins or Groovy scripts to do cross-platform build-time tasks, even under ant in Hadoop-1.
Please vote +1, 0, -1.

>>> I believe 1&2 in combination make a total sense. I ported a few scripts to Python, and thus far, it showed to be up to the task and satisfy the cross-platform requirements. In my option, it is also important to agree on the version, as I've run into some breaking changes in version 3+.
3. Contributors shall be allowed to use Python as a platform-independent scripting language for run-time tasks, and add Python as a run-time dependency.

>>> This is a great aspirational goal! Maintaining two sets of scripts would be a real challenge.
Please vote +1, 0, -1.

Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use Maven plug-ins or Groovy as the only means of cross-platform build-time tasks, or to simply continue using platform-dependent scripts as is being done today.

Vote closes at 12Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and until those are worked out I don't want to delay moving to cross-platform scripts for build-time tasks.

Best regards,
+
Raja Aluri 2012-12-01, 00:57
+
Eli Collins 2012-12-01, 01:08
+
Steve Loughran 2012-12-01, 10:44
+
Doug Cutting 2012-12-01, 18:23
+
Konstantin Boudnik 2012-12-13, 00:53
+
Doug Cutting 2012-11-30, 16:55
+
Joep Rottinghuis 2012-12-01, 20:28
+
Eric Yang 2012-12-02, 06:07
+
Konstantin Boudnik 2012-12-13, 00:55
+
Tom White 2012-12-03, 14:23
+
Chris Nauroth 2012-11-25, 07:18
+
Suresh Srinivas 2012-11-26, 20:41
+
Konstantin Boudnik 2012-11-26, 18:30
+
Radim Kolar 2012-11-26, 17:34
+
Colin McCabe 2012-11-26, 16:53
+
Chris Nauroth 2012-11-26, 17:44
+
Luke Lu 2012-11-26, 17:25
+
Giridharan Kesavan 2012-11-26, 21:16
+
Alejandro Abdelnur 2012-11-26, 21:52
+
Radim Kolar 2012-11-26, 22:17
+
Robert Evans 2012-11-26, 16:16
+
Adam Berry 2012-11-26, 16:45
+
Steve Loughran 2012-11-25, 12:39
+
Doug Cutting 2012-12-03, 18:37
+
Matt Foley 2012-12-03, 19:21
+
Doug Cutting 2012-12-03, 19:37
+
Matt Foley 2012-12-03, 22:08
+
Doug Cutting 2012-12-03, 23:57
+
Matt Foley 2012-12-04, 01:22
+
Doug Cutting 2012-12-04, 04:50
+
Matt Foley 2012-12-04, 17:58
+
Radim Kolar 2012-12-04, 19:41
+
Matt Foley 2012-12-04, 20:28
+
Alejandro Abdelnur 2012-12-04, 21:00
+
Matt Foley 2012-12-04, 22:35
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB