Another quick question: the trend lines drawn on the graphs seem to be
based on some assumption that there is an exponential scaling pattern. In
practice I would think it would be sigmoid -- while the dataset size is
smaller than cache capacity, changing the dataset size should have little
to no effect on the latency (since you'd get 100% hit rate). As soon as it
starts to be larger than the cache capacity, you'd expect the hit rate to
be on average equal to (size of cache / size of data). The average latency,
then, should be just about equal to the cache miss latency multiplied by
the cache miss ratio. That is to say, as the dataset gets larger, the
latency will level out as a flat line, not continue to grow as your trend
lines are showing.

On Fri, Apr 4, 2014 at 9:40 PM, Stack <[EMAIL PROTECTED]> wrote:
Todd Lipcon
Software Engineer, Cloudera

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB