Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Accumulo Caching for benchmarking


+
Steven Troxell 2012-08-03, 21:50
+
William Slacum 2012-08-04, 00:55
+
Josh Elser 2012-08-04, 00:59
Copy link to this message
-
Re: Accumulo Caching for benchmarking
Steve-

I would probably design the experiment to test different cluster sizes
as completely independent. That means, taking the entire thing down
and back up again (possibly even rebooting the boxes, and/or
re-initializing the cluster at the new size). I'd also do several runs
while it is up at a particular cluster size, to capture any
performance difference between the first and a later run due to OS or
TServer caching, for analysis later.

Essentially, when in doubt, take more data...

--L
On Fri, Aug 3, 2012 at 5:50 PM, Steven Troxell <[EMAIL PROTECTED]> wrote:
> Hi  all,
>
> I am running a benchmarking project on accumulo looking at RDF queries for
> clusters with different node sizes.   While I intend to look at caching for
> each optimizing each individual run, I do NOT want caching to interfere for
> example between runs involving the use of 10 and 8 tablet servers.
>
> Up to now I'd just been killing nodes via the bin/stop-here.sh script but I
> realize that may have allowed caching from previous runs with different node
> sizes to influence my results.   It seemed weird to me for exmaple when I
> realized dropping nodes actually increased performance (as measured by query
> return times) in some cases (though I acknowledge the code I'm working with
> has some serious issues with how ineffectively it is actually utilizing
> accumulo, but that's an issue I intend to address later).
>
> I suppose one way would be between a change of node sizes,  stop and restart
> ALL nodes ( as opposed to what I'd been doing in just killing 2 nodes for
> example in transitioning from a 10 to 8 node test).  Will this be sure to
> clear the influence of caching across runs, and is there any cleaner way to
> do this?
>
> thanks,
> Steve
+
Eric Newton 2012-08-04, 11:19
+
Steven Troxell 2012-08-04, 17:21
+
Steven Troxell 2012-08-06, 18:41
+
Steven Troxell 2012-08-07, 14:57
+
Eric Newton 2012-08-07, 16:47
+
Steven Troxell 2012-08-07, 16:53
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB