Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> GSOC: Monitor Improvements


Copy link to this message
-
Re: GSOC: Monitor Improvements
Adding on to the comment about summaries, averages, and outliers. If, for
some reason, you end up with a two-hump population, then simply showing
averages will mask the split and lose a lot of valuable information. It is
often valuable to know that a particular set of users or servers are
experiencing degraded performance while the rest of the ecosystem is
healthy.

This isn't something that shows up in a regular time series because the
secondary population is usually very small compared to the total
population. There was a graph for request latency of a service that I saw
once that I really wish I could find again, maybe somebody on the list will
be able to chime in - It had timestamps on the x-axis, latency on the y,
and each (x,y) point was colored on a gradient representing how many
requests were fulfilled at time x with latency y. This chart make it
immediately easy to see that most data points fit a normal distribution
with a low mean, but there was also a cluster at the top for some reason.

I'd love to see that type of chart show up for tablet servers (probably not
as useful for tables).

Mike
On Mon, Apr 22, 2013 at 9:05 AM, Eric Newton <[EMAIL PROTECTED]> wrote:

> Another thing to consider is scale.  On large clusters (many hundreds of
> nodes), more data is not helpful for visualization.  Instead, summaries,
> averages and outliers are important.
>
> For example, if one node is consistently slow, it is better to know that
> than to see one graph with low numbers in a sea of graphs.
>
> If the monitor collects information using JMX, collection time for each
> node would be a good thing to know, too.
>
> -Eric
>
>
> On Sun, Apr 21, 2013 at 10:00 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
>
> > Supun,
> >
> > Yup, very much so. Having a way to consume any and all metrics via JMX
> > would simplify things for any consumers (internal or external).
> >
> >
> >
> > On 04/21/2013 02:15 PM, Supun Kamburugamuva wrote:
> >
> >> Hi Josh,
> >>
> >> Thanks for the suggestions. I'll incorporate these to the proposal.
> >>
> >> Another area I would like to work is on JMX. There is a Jira that says
> to
> >> replace the Monitor calls from Thrift to JMX (Accumulo 694). Do you
> think
> >> this is a good addition to the Monitor?
> >>
> >> Thanks,
> >> Supun..
> >>
> >>
> >> On Sun, Apr 21, 2013 at 1:45 PM, Josh Elser <[EMAIL PROTECTED]>
> wrote:
> >>
> >>  Supun,
> >>>
> >>> Looks good! Can I make some suggestions/comments?
> >>>
> >>> For: "Per table plots: ACCUMULO-594", I'd also like to see minor
> >>> compactions, major compactions, index cache hit rate, and data cache
> hit
> >>> rate per table (same graphs that are displayed system-wide when you
> visit
> >>> http://${MONITOR_HOST}:50095/.
> >>>
> >>> For "Per tablet [server] plots", it would be neat if you could also
> >>> extract some general statistics like top N least performing, top N
> >>> highest
> >>> performing, etc. tablet servers. Ideally, this could correlate with
> >>> servers
> >>> that may be having problems :).
> >>>
> >>> Do you see these proposed changes as being sufficient for 3-4 months of
> >>> 40hrs/week work? If you plan to really dig into these changes (perhaps
> >>> reworking components of the monitor itself), I could perhaps see this.
> Do
> >>> you have any ideas for more lofty goals that you could pursue as well?
> I
> >>> don't want you/us to get one month into things and see you complete
> >>> everything we initially planned to accomplish :)
> >>>
> >>> - Josh
> >>>
> >>>
> >>> On 04/21/2013 10:37 AM, Supun Kamburugamuva wrote:
> >>>
> >>>  Hi all,
> >>>>
> >>>> I would like to start writing the proposal for the GSoc. I've put
> >>>> together
> >>>> some initial high level goals of the project. Please let me know what
> I
> >>>> can
> >>>> improve.
> >>>>
> >>>> Per table plots: Accumulo 594
> >>>> ---------------------
> >>>>
> >>>> The goal of this is to display plots that explains the various
> >>>> activtities
> >>>> that happens per table. When we go to the tables page of the monitor
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB