Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Bigtop >> mail # dev >> adding HA monitoring to bigtop


Copy link to this message
-
Re: adding HA monitoring to bigtop
On 11 October 2012 00:13, Konstantin Boudnik <[EMAIL PROTECTED]> wrote:

> Steve,
>
> great stuff. Here's my initial feedback:
>
> 1. I am not passing judgement about how the monitoring is done, although
> something like Nagios would fill the bill good enough, IMO.
Nagios can over-react and 4000 emails out when a service isn't responding,
without noting that it's down for another reason -and it's biased towards
those email notifications.
> Anyway... It seems
> like this monitoring is very Hadoop HA specific,

It could actually monitor any service with one or more of
 pid
 port
 URL

The Hadoop-ness currently comes from
 1. specific probes for HDFS and JT
 2. use of hadoop XML config for settings (trivial fix)
 3. -probe order fixed in source
 4. no current support for adding new probes just by putting them on the
classpath and declaring them
 5. an installation that goes under /usr/lib/hadoop and picks up the hadoop
classpath and native lib so its hadoop probes are always in sync with the
runtime,

I'd fix 2 & 3 by having a better config language that lets you specify an
order of operations
> I would say that it is better
> be kept in Hadoop in one form or another - hadoop/contrib seems like a good
> place to start, In other words, I don't think this is generic enough
> monitoring software to be included into the BigTop.
OK
> Say, I'd be happy to
> include Ganglia or some Nagios hooks for the same purposes.  Packaging for
> this monitoring software can be of course added to the BigTop stack like we
> are doing this for many other components - it looks very reasonable
> approach.
>
> 2. The failure inducing library seems like a great addition to the iTest.
> In
> fact, if I were doing Hadoop fault injection again I would certainly go
> with
> MOP'ping and Groovy-based framework, instead of AspectJ boredom. So, I like
> the idea and it seems to fit very well with the original design ideas of
> the
> iTest - let's add the library to the BigTop. There things to look at and
> discuss of course but I like the overall idea!
>

OK, -this bit of it is v. immature and might ultimately go into its own
module, so that hadoop HA tests can use it too

>
> To summarize: I'm rather negative on keeping the monitoring software as a
> part
> of the BigTop; and I am quite positive on bring the testing lib as a part
> of
> the iTest.
>

I'll have a look at iTest and see where it fits in, then we can start
thinking about what a good test framework for triggering infrastructures
would be. I think what I've got is just a starting point. FWIW jclouds is
looking at vbox integration too, via its Web Service API -it could be used
to trigger VM death in any virtual infrastructure, we'd just need to add
back ends for physical infrastructures (for now: dialog boxes & fencing
scripts), and the code to cause trouble inside the VM itself.

BTW, one thing you can do with virtual infrastructure is forced volume
unmounts, umount -f, which could be used to simulate disk, disk controller
or disk driver problems. Something like that would be really good for
generating stress tests of all the storage layers.

-steve