|
|
-
high-percentile latency metrics, aka HBASE-6261
Andrew Wang 2012-06-27, 01:35
Hi all,
I'm looking into some better ways of estimating high-percentile latency, since I don't think the existing sampling-based method does a good job. I wrote up a document and put it on HBASE-6261 that outlines what I think are the available options; I'd encourage any resident stats experts / people interested in metrics to take a look.
I wanted to ask off JIRA though about what would be useful in practice. I think it'd be nice to see, for example, accurate 90th and 99th percentile latency over recent 10s, 1m, 5m, and 15m time windows. I found some nice algos to do this, I think at the cost of MBs of memory.
If this sounds like overkill though, there are even cheaper algos that provide a more qualitative feeling of how the latency distribution is changing over time. You give up one or more of bounded error, exact percentiles, or time-based windows, but you do get the general feeling of up vs. down.
So, is the "full" solution compelling enough to proceed? Anything missing/extraneous?
Thanks, Andrew
-
Re: high-percentile latency metrics, aka HBASE-6261
Stack 2012-06-28, 22:31
On Tue, Jun 26, 2012 at 6:35 PM, Andrew Wang <[EMAIL PROTECTED]> wrote: > I wanted to ask off JIRA though about what would be useful in practice. I > think it'd be nice to see, for example, accurate 90th and 99th percentile > latency over recent 10s, 1m, 5m, and 15m time windows. I found some nice > algos to do this, I think at the cost of MBs of memory. >
Agree.
How many MBs? > So, is the "full" solution compelling enough to proceed? Anything > missing/extraneous? >
Whats going on is a critical focus going forward so I'd say 'full' unless the cost obscene.
St.Ack
-
Re: high-percentile latency metrics, aka HBASE-6261
Andrew Wang 2012-06-28, 22:53
I put this on the jira too, but the algo I found whittled down a stream of 10 million items down to ~19.5k samples. With each sample at ~36B, that's ~685KiB. There's a bit more from using a LinkedList and general bookkeeping.
Since the estimator is reset every O(minutes) window, and I doubt very many metrics see more than 10 million items in O(minutes), it seems lightweight enough to keep going.
I'm planning on doing this in hadoop-common's metrics2 since HDFS is also interested, backporting to 1.x and 2.x. This would thus depend on the metrics2 conversion (HBASE-4050) going through too.
Thanks, Andrew
On Thu, Jun 28, 2012 at 3:31 PM, Stack <[EMAIL PROTECTED]> wrote:
> On Tue, Jun 26, 2012 at 6:35 PM, Andrew Wang <[EMAIL PROTECTED]> > wrote: > > I wanted to ask off JIRA though about what would be useful in practice. I > > think it'd be nice to see, for example, accurate 90th and 99th percentile > > latency over recent 10s, 1m, 5m, and 15m time windows. I found some nice > > algos to do this, I think at the cost of MBs of memory. > > > > Agree. > > How many MBs? > > > > So, is the "full" solution compelling enough to proceed? Anything > > missing/extraneous? > > > > Whats going on is a critical focus going forward so I'd say 'full' > unless the cost obscene. > > St.Ack >
-
Re: high-percentile latency metrics, aka HBASE-6261
Stack 2012-06-29, 17:07
On Fri, Jun 29, 2012 at 12:53 AM, Andrew Wang <[EMAIL PROTECTED]> wrote: > I put this on the jira too, but the algo I found whittled down a stream of > 10 million items down to ~19.5k samples. With each sample at ~36B, that's > ~685KiB. There's a bit more from using a LinkedList and general bookkeeping. > > Since the estimator is reset every O(minutes) window, and I doubt very many > metrics see more than 10 million items in O(minutes), it seems lightweight > enough to keep going. > > I'm planning on doing this in hadoop-common's metrics2 since HDFS is also > interested, backporting to 1.x and 2.x. This would thus depend on the > metrics2 conversion (HBASE-4050) going through too. >
Sounds great Andrew. St.Ack
|
|