Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Bigtop >> mail # dev >> Python Fork

Copy link to this message
Re: Python Fork
On 03/27/2013 06:54 AM, Philip Herron wrote:
> Hey all
> I am new to bigtop and I would like to get feedback on this initial
> concept from the community on this.
> I have created a fork of bigtop in my own time at:
> http://git.buildy.org/?p=bigtop.git;a=shortlog;h=refs/heads/python-fork
>    $ git clone git://buildy.org/bigtop.git
>    $ git checkout --track -b python-fork origin/python-fork
> This fork of bigtop's toplevel Make/Bash logic within python has a
> number of benefits to the whole community of bigtop for its most
> important function namely packaging.
> 1 - Bigtop's logic is handled within gnu/make and bash which is
> detrimental for the long term maintainability of bigtop as it may turn
> away developers.
> 2 - Make is hard to debug in this way esp for the need of \ for line run-on.
> 3 - This new directory structure feels more maintainable already with
> bigtop's main function to generate packages for hadoop stack's on
> different platforms. Other functions which (don't seem) to be actively
> maintained like bigtop-deploy and the test framework with regards to
> their documentation.
> 4 - Using python we can actually take advantage of the system. Using
> threads to do multiple builds of different components at the same time.
> Yes i know we could probably hack together make -j n but this will
> further make things harder to maintain.
> 5 - The bigtop.mk BOM file was difficult to read and find errors with
> eval and ?= as well as having archives set in Makefile and package.mk
> doing a lot of tricky bash to get things going.
> Using Python ConfigParser we can have a single BOM file for all the
> dynamic data we care about instead of over several makefiles.
> ### bigtop.BOM ###
> APACHE_MIRROR = http://apache.osuosl.org
> APACHE_ARCHIVE = http://archive.apache.org/dist
> BOM = hadoop
> [hadoop]
> NAME = hadoop
> BASE_VERSION = 2.0.3-alpha
> PKG_VERSION = 2.0.3
> SRC = %(NAME)s-%(BASE_VERSION)s-src.tar.gz
> DST = %(NAME)s-%(BASE_VERSION)s.tar.gz
> DOWNLOAD_PATH = /hadoop/core/%(NAME)s-%(BASE_VERSION)s
> ### EOF ###
> I need to spend time documenting but i think this looks very similar but
> cleaner. We have DEFAULTS section to specify our archives then a BOM
> section listing all the components we want in the stack.
> The Hadoop component is the only one i have tested with but i don�t see
> any problems adding others. The LOC field in the hadoop component is
> interesting as we can specify the ARCHIVES and it will look in all for
> the tarball we want if the first was down it will go and look into the
> next one if you have several specified.
> ----
> Over all i could go on and talk about more issues and things that need
> looked at and why this is a good step for bigtop to look at taking but i
> think feedback from the community is more important now.
> One thing to note is this isn't any extra dependency as this is just
> stock python code no need for pip install or easy_install of modules.
> And almost all distro's of linux/unix come with python so i think its fine.
> Thanks for reading this! I will aim to complete the BOM against trunk
> bigtop and see if i come against anything.
> --Phil
Hi Philip,

I like the direction this is going. But why not using scons? It's in
python, it's available everywhere and already has support for parallel