Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Bigtop >> mail # dev >> adding HA monitoring to bigtop

Copy link to this message
Re: adding HA monitoring to bigtop
On Thu, Oct 11, 2012 at 1:34 AM, Steve Loughran <[EMAIL PROTECTED]> wrote:
>> Anyway... It seems
>> like this monitoring is very Hadoop HA specific,
> It could actually monitor any service with one or more of
>  pid
>  port
>  URL
> The Hadoop-ness currently comes from
>  1. specific probes for HDFS and JT
>  2. use of hadoop XML config for settings (trivial fix)
>  3. -probe order fixed in source
>  4. no current support for adding new probes just by putting them on the
> classpath and declaring them
>  5. an installation that goes under /usr/lib/hadoop and picks up the hadoop
> classpath and native lib so its hadoop probes are always in sync with the
> runtime,
> I'd fix 2 & 3 by having a better config language that lets you specify an
> order of operations

This sounds pretty interesting. I remember Jos wanting to make
his monitoring code (I believe based on daemontools) available under:
there's a more generic JIRA as well:

Feel free to put more concrete proposals there. Also, if Jos could
comment -- that'll be terrific!

>> I would say that it is better
>> be kept in Hadoop in one form or another - hadoop/contrib seems like a good
>> place to start, In other words, I don't think this is generic enough
>> monitoring software to be included into the BigTop.
> OK

But again -- it all depends on where we draw the line. Generic
functionality that makes it easier to use Bigtop is always

>> To summarize: I'm rather negative on keeping the monitoring software as a
>> part
>> of the BigTop; and I am quite positive on bring the testing lib as a part
>> of
>> the iTest.
> I'll have a look at iTest and see where it fits in, then we can start
> thinking about what a good test framework for triggering infrastructures
> would be. I think what I've got is just a starting point. FWIW jclouds is
> looking at vbox integration too, via its Web Service API -it could be used
> to trigger VM death in any virtual infrastructure, we'd just need to add
> back ends for physical infrastructures (for now: dialog boxes & fencing
> scripts), and the code to cause trouble inside the VM itself.
> BTW, one thing you can do with virtual infrastructure is forced volume
> unmounts, umount -f, which could be used to simulate disk, disk controller
> or disk driver problems. Something like that would be really good for
> generating stress tests of all the storage layers.

Please let us know if you need any help wrt. itest documentation, etc.
It is really quite terse at the moment.