+ You measure 99th percentile. Did you take measure of average/mean response times doing your blockcache comparison? (Our LarsHofhansl had it that that on average reads out of bucket cache were a good bit slower). Or is this a TODO? + We should just remove slabcache because bucket cache is consistently better and why have two means of doing same thing? Or, do you need more proof bucketcache subsumes slabcache?
Another quick question: the trend lines drawn on the graphs seem to be based on some assumption that there is an exponential scaling pattern. In practice I would think it would be sigmoid -- while the dataset size is smaller than cache capacity, changing the dataset size should have little to no effect on the latency (since you'd get 100% hit rate). As soon as it starts to be larger than the cache capacity, you'd expect the hit rate to be on average equal to (size of cache / size of data). The average latency, then, should be just about equal to the cache miss latency multiplied by the cache miss ratio. That is to say, as the dataset gets larger, the latency will level out as a flat line, not continue to grow as your trend lines are showing.
-Todd On Fri, Apr 4, 2014 at 9:40 PM, Stack <[EMAIL PROTECTED]> wrote: Todd Lipcon Software Engineer, Cloudera
Did you take measure of average/mean response times doing your blockcache Yes, in total I also collected mean 50%, 95%, 99%, and 99.9% latency values. I only performed the analysis over the 99% in the post. I looked briefly also at the 99.9% but that wasn't immediately relevant to the context of the experiment. All of these data are included in the "raw results" csv I uploaded and linked from the "Showdown" post.
do you need more proof bucketcache subsumes slabcache? I'd like more vetting, yes. As you alluded to in the previous question, a more holistic view of response times would be good, and also I'd like to see how they perform with a mixed workload. Next step is probably to exercise them with some YSCB workloads of varying RAM:DB ratios.
the trend lines drawn on the graphs seem to be based on some assumption Which charts are you specifically referring to? Indeed, the trend lines were generated rather casually with Excel and may be misleading. Perhaps a more responsible representation would be to simply connect each data point with a line to aid visibility.
In practice I would think it would be sigmoid [...] As soon as it starts to When decoupling cache size from database size, you're presumably correct. I believe that's what's shown in the figures in perfeval_blockcache_v1.pdf, especially as total memory increases. The plateau effect is suggested in the 20G and 50G charts in that book. This is why I included the second set of charts in perfeval_blockcache_v2.pdf. The intention is to couple the cache size to dataset size and demonstrate how an implementation performs as the absolute values increase. That is, assuming hit,eviction rate remain roughly constant, how well does an implementation "scale up" to a larger memory footprint.
And yep, I think straight lines between the points (or just the points themselves) might be more accurate.
Hmm... in "v2.pdf" here you're looking at different ratios of DB size to cache size, but there's also the secondary cache on the system (the OS block cache), right? So when you say only 20GB "memory under management", in fact you're still probably getting 100% hit rate on the case where the DB is bigger than RAM, right?
I guess I just find the graphs a little hard to understand what they're trying to demonstrate. Maybe would be better to have each graph show the different cache implementations overlaid, rather than the different ratios overlaid? That would better differentiate the scaling behavior of the implementations vs each other. As you've got it, the results seem somewhat obvious ("as the hit ratio gets worse, it gets slower").
On Mon, Apr 14, 2014 at 10:12 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
Yes, this is true.
So when you say only 20GB "memory under management", in fact you're still
I can speculate, likely that's true, but I don't know this for certain. At the moment, the only points of instrumentation in the harness are in the HBase client. The next steps include pushing down into the RS, DN and further to then is to the OS itself.
Maybe would be better to have each graph show the different cache I did experiment with that initially. I found the graphs became dense and unreadable. I need to spend more time studying Tufti to present all these data points in a single figure. The data is all included, so please, by all means have a crack at it. Maybe you'll see something I didn't.
As you've got it, the results seem somewhat obvious ("as the hit ratio
Yes, that's true. Of interest in this particular experiment was the relative performance of different caches under identical workloads.
Here is a follow up to Nick's blockcache 101 that compares a number of deploys x loadings and makes recommendation: https://blogs.apache.org/hbase/ St.Ack On Fri, Apr 4, 2014 at 9:22 PM, Stack <[EMAIL PROTECTED]> wrote: