Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Bigtop >> mail # dev >> Re: Bigtop contributions [Was: [VOTE] - Release 2.0.5-beta]

Copy link to this message
Re: Bigtop contributions [Was: [VOTE] - Release 2.0.5-beta]
On Sunday, June 9, 2013, Roman Shaposhnik wrote:

> On Wed, Jun 5, 2013 at 12:37 PM, Andrew Purtell <[EMAIL PROTECTED]<javascript:;>>
> wrote:
> > Additional to code contributions, do you guys have any thoughts about or
> > interest in infrastructure donations? (Would be EC2 based.)
> That would be so helpful you won't even believe it! If you have any
> practical ideas in that area -- please let me know (on list, off-list or
> in person ;-)).

Over at we set up a Jenkins instance backed by a
dynamic EC2 slave pool, using a custom AMI based on Amazon Linux with some
preconfigurations and dependency preloads for HBase unit testing. This was
when ASF Jenkins was having some issues. Now that those are resolved we are
going to scale this back. That would free up some resorces for Jenkins jobs
for Bigtop.

However, what I was thinking - would take a bit more time to set up - is
something like running "nightlies" (maybe 3 times a week?) against a fully
provisioned Bigtop cluster - maybe 10x m2.4xlarge, or the SSD backed high
IO type. I think Hadoop, HBase, Pig, Hive, Giraph, and Phoenix tests can
take advantage of this right away, can submit or expand their executions to
a full cluster. Haven't looked at the others. So this might look
like: launch a transient package build host (a m1.large maybe), build the
Bigtop packages, start up the test cluster, deploy the packages from a Yum
or Apt repo on the build host, configure the test cluster (using the Puppet
stuff?), run the full suite of integration tests, then collect the test
output and all of the log files up to an S3 bucket for posterity, shut
everything down, post process the results, and finally kick off emails to

After getting something like the above working I can imagine a process of
refining the post processing over time to get better about flagging errors
and anomalies.

To be a useful resource for Hadoop integration testing we could consider
a variation where the current Bigtop BOM is used except for in the case
of Hadoop, for that instead the version in bigtop.mk could be "branch-2"
with the github mirror as site.

Perhaps some vendors might consider donating to this after something is off
the ground.
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)