Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Bigtop >> mail # dev >> adding HA monitoring to bigtop


Copy link to this message
-
Re: adding HA monitoring to bigtop
On 11 October 2012 00:13, Konstantin Boudnik <[EMAIL PROTECTED]> wrote:

> Steve,
>
> great stuff. Here's my initial feedback:
>
> 1. I am not passing judgement about how the monitoring is done, although
> something like Nagios would fill the bill good enough, IMO.
Nagios can over-react and 4000 emails out when a service isn't responding,
without noting that it's down for another reason -and it's biased towards
those email notifications.
> Anyway... It seems
> like this monitoring is very Hadoop HA specific,

It could actually monitor any service with one or more of
 pid
 port
 URL

The Hadoop-ness currently comes from
 1. specific probes for HDFS and JT
 2. use of hadoop XML config for settings (trivial fix)
 3. -probe order fixed in source
 4. no current support for adding new probes just by putting them on the
classpath and declaring them
 5. an installation that goes under /usr/lib/hadoop and picks up the hadoop
classpath and native lib so its hadoop probes are always in sync with the
runtime,

I'd fix 2 & 3 by having a better config language that lets you specify an
order of operations
> I would say that it is better
> be kept in Hadoop in one form or another - hadoop/contrib seems like a good
> place to start, In other words, I don't think this is generic enough
> monitoring software to be included into the BigTop.
OK
> Say, I'd be happy to
> include Ganglia or some Nagios hooks for the same purposes.  Packaging for
> this monitoring software can be of course added to the BigTop stack like we
> are doing this for many other components - it looks very reasonable
> approach.
>
> 2. The failure inducing library seems like a great addition to the iTest.
> In
> fact, if I were doing Hadoop fault injection again I would certainly go
> with
> MOP'ping and Groovy-based framework, instead of AspectJ boredom. So, I like
> the idea and it seems to fit very well with the original design ideas of
> the
> iTest - let's add the library to the BigTop. There things to look at and
> discuss of course but I like the overall idea!
>

OK, -this bit of it is v. immature and might ultimately go into its own
module, so that hadoop HA tests can use it too

>
> To summarize: I'm rather negative on keeping the monitoring software as a
> part
> of the BigTop; and I am quite positive on bring the testing lib as a part
> of
> the iTest.
>

I'll have a look at iTest and see where it fits in, then we can start
thinking about what a good test framework for triggering infrastructures
would be. I think what I've got is just a starting point. FWIW jclouds is
looking at vbox integration too, via its Web Service API -it could be used
to trigger VM death in any virtual infrastructure, we'd just need to add
back ends for physical infrastructures (for now: dialog boxes & fencing
scripts), and the code to cause trouble inside the VM itself.

BTW, one thing you can do with virtual infrastructure is forced volume
unmounts, umount -f, which could be used to simulate disk, disk controller
or disk driver problems. Something like that would be really good for
generating stress tests of all the storage layers.

-steve
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB