Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - Accumulo Caching for benchmarking


Copy link to this message
-
Re: Accumulo Caching for benchmarking
William Slacum 2012-08-04, 00:55
Steve, I'm a little confused. The Rfile block cache is tied to a TServer,
so if you kill a node, its cache should go away. Are you querying for the
same data after you kill the node that hosted the tablet which contained
the data? Also, between runs, you could stop and restart everything,
thereby eliminating the cache.

On Fri, Aug 3, 2012 at 5:50 PM, Steven Troxell <[EMAIL PROTECTED]>wrote:

> Hi  all,
>
> I am running a benchmarking project on accumulo looking at RDF queries for
> clusters with different node sizes.   While I intend to look at caching for
> each optimizing each individual run, I do NOT want caching to interfere for
> example between runs involving the use of 10 and 8 tablet servers.
>
> Up to now I'd just been killing nodes via the bin/stop-here.sh script but
> I realize that may have allowed caching from previous runs with different
> node sizes to influence my results.   It seemed weird to me for exmaple
> when I realized dropping nodes actually increased performance (as measured
> by query return times) in some cases (though I acknowledge the code I'm
> working with has some serious issues with how ineffectively it is actually
> utilizing accumulo, but that's an issue I intend to address later).
>
> I suppose one way would be between a change of node sizes,  stop and
> restart ALL nodes ( as opposed to what I'd been doing in just killing 2
> nodes for example in transitioning from a 10 to 8 node test).  Will this be
> sure to clear the influence of caching across runs, and is there any
> cleaner way to do this?
>
> thanks,
> Steve
>