Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Bigtop, mail # dev - Python Fork


Copy link to this message
-
Re: Python Fork
Bruno Mahé 2013-03-29, 08:42
On 03/27/2013 06:54 AM, Philip Herron wrote:
> Hey all
>
> I am new to bigtop and I would like to get feedback on this initial
> concept from the community on this.
>
> I have created a fork of bigtop in my own time at:
>
> http://git.buildy.org/?p=bigtop.git;a=shortlog;h=refs/heads/python-fork
>
>    $ git clone git://buildy.org/bigtop.git
>    $ git checkout --track -b python-fork origin/python-fork
>
> This fork of bigtop's toplevel Make/Bash logic within python has a
> number of benefits to the whole community of bigtop for its most
> important function namely packaging.
>
> 1 - Bigtop's logic is handled within gnu/make and bash which is
> detrimental for the long term maintainability of bigtop as it may turn
> away developers.
>
> 2 - Make is hard to debug in this way esp for the need of \ for line run-on.
>
> 3 - This new directory structure feels more maintainable already with
> bigtop's main function to generate packages for hadoop stack's on
> different platforms. Other functions which (don't seem) to be actively
> maintained like bigtop-deploy and the test framework with regards to
> their documentation.
>
> 4 - Using python we can actually take advantage of the system. Using
> threads to do multiple builds of different components at the same time.
> Yes i know we could probably hack together make -j n but this will
> further make things harder to maintain.
>
> 5 - The bigtop.mk BOM file was difficult to read and find errors with
> eval and ?= as well as having archives set in Makefile and package.mk
> doing a lot of tricky bash to get things going.
>
> Using Python ConfigParser we can have a single BOM file for all the
> dynamic data we care about instead of over several makefiles.
>
> ### bigtop.BOM ###
> [DEFAULT]
> APACHE_MIRROR = http://apache.osuosl.org
> APACHE_ARCHIVE = http://archive.apache.org/dist
> ARCHIVES = %(APACHE_MIRROR)s %(APACHE_ARCHIVE)s
>
> [BIGTOP]
> BOM = hadoop
>
> [hadoop]
> NAME = hadoop
> BASE_VERSION = 2.0.3-alpha
> PKG_VERSION = 2.0.3
> RELEASE = 1
> SRC_TYPE = TARGZ
> SRC = %(NAME)s-%(BASE_VERSION)s-src.tar.gz
> DST = %(NAME)s-%(BASE_VERSION)s.tar.gz
> LOC = %(ARCHIVES)s
> DOWNLOAD_PATH = /hadoop/core/%(NAME)s-%(BASE_VERSION)s
> ### EOF ###
>
> I need to spend time documenting but i think this looks very similar but
> cleaner. We have DEFAULTS section to specify our archives then a BOM
> section listing all the components we want in the stack.
>
> The Hadoop component is the only one i have tested with but i don�t see
> any problems adding others. The LOC field in the hadoop component is
> interesting as we can specify the ARCHIVES and it will look in all for
> the tarball we want if the first was down it will go and look into the
> next one if you have several specified.
>
> ----
>
> Over all i could go on and talk about more issues and things that need
> looked at and why this is a good step for bigtop to look at taking but i
> think feedback from the community is more important now.
>
> One thing to note is this isn't any extra dependency as this is just
> stock python code no need for pip install or easy_install of modules.
> And almost all distro's of linux/unix come with python so i think its fine.
>
> Thanks for reading this! I will aim to complete the BOM against trunk
> bigtop and see if i come against anything.
>
> --Phil
>
Hi Philip,

I like the direction this is going. But why not using scons? It's in
python, it's available everywhere and already has support for parallel
builds.
Thanks,
Bruno