Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Bigtop, mail # dev - Python Fork


Copy link to this message
-
Python Fork
Philip Herron 2013-03-27, 13:54
Hey all

I am new to bigtop and I would like to get feedback on this initial
concept from the community on this.

I have created a fork of bigtop in my own time at:

http://git.buildy.org/?p=bigtop.git;a=shortlog;h=refs/heads/python-fork

  $ git clone git://buildy.org/bigtop.git
  $ git checkout --track -b python-fork origin/python-fork

This fork of bigtop's toplevel Make/Bash logic within python has a
number of benefits to the whole community of bigtop for its most
important function namely packaging.

1 - Bigtop's logic is handled within gnu/make and bash which is
detrimental for the long term maintainability of bigtop as it may turn
away developers.

2 - Make is hard to debug in this way esp for the need of \ for line run-on.

3 - This new directory structure feels more maintainable already with
bigtop's main function to generate packages for hadoop stack's on
different platforms. Other functions which (don't seem) to be actively
maintained like bigtop-deploy and the test framework with regards to
their documentation.

4 - Using python we can actually take advantage of the system. Using
threads to do multiple builds of different components at the same time.
Yes i know we could probably hack together make -j n but this will
further make things harder to maintain.

5 - The bigtop.mk BOM file was difficult to read and find errors with
eval and ?= as well as having archives set in Makefile and package.mk
doing a lot of tricky bash to get things going.

Using Python ConfigParser we can have a single BOM file for all the
dynamic data we care about instead of over several makefiles.

### bigtop.BOM ###
[DEFAULT]
APACHE_MIRROR = http://apache.osuosl.org
APACHE_ARCHIVE = http://archive.apache.org/dist
ARCHIVES = %(APACHE_MIRROR)s %(APACHE_ARCHIVE)s

[BIGTOP]
BOM = hadoop

[hadoop]
NAME = hadoop
BASE_VERSION = 2.0.3-alpha
PKG_VERSION = 2.0.3
RELEASE = 1
SRC_TYPE = TARGZ
SRC = %(NAME)s-%(BASE_VERSION)s-src.tar.gz
DST = %(NAME)s-%(BASE_VERSION)s.tar.gz
LOC = %(ARCHIVES)s
DOWNLOAD_PATH = /hadoop/core/%(NAME)s-%(BASE_VERSION)s
### EOF ###

I need to spend time documenting but i think this looks very similar but
cleaner. We have DEFAULTS section to specify our archives then a BOM
section listing all the components we want in the stack.

The Hadoop component is the only one i have tested with but i don�t see
any problems adding others. The LOC field in the hadoop component is
interesting as we can specify the ARCHIVES and it will look in all for
the tarball we want if the first was down it will go and look into the
next one if you have several specified.

----

Over all i could go on and talk about more issues and things that need
looked at and why this is a good step for bigtop to look at taking but i
think feedback from the community is more important now.

One thing to note is this isn't any extra dependency as this is just
stock python code no need for pip install or easy_install of modules.
And almost all distro's of linux/unix come with python so i think its fine.

Thanks for reading this! I will aim to complete the BOM against trunk
bigtop and see if i come against anything.

--Phil