|
Steven Troxell
2012-08-03, 21:50
William Slacum
2012-08-04, 00:55
Josh Elser
2012-08-04, 00:59
Christopher Tubbs
2012-08-04, 01:41
Eric Newton
2012-08-04, 11:19
Steven Troxell
2012-08-04, 17:21
Steven Troxell
2012-08-06, 18:41
Steven Troxell
2012-08-07, 14:57
Eric Newton
2012-08-07, 16:47
Steven Troxell
2012-08-07, 16:53
|
-
Accumulo Caching for benchmarkingSteven Troxell 2012-08-03, 21:50
Hi all,
I am running a benchmarking project on accumulo looking at RDF queries for clusters with different node sizes. While I intend to look at caching for each optimizing each individual run, I do NOT want caching to interfere for example between runs involving the use of 10 and 8 tablet servers. Up to now I'd just been killing nodes via the bin/stop-here.sh script but I realize that may have allowed caching from previous runs with different node sizes to influence my results. It seemed weird to me for exmaple when I realized dropping nodes actually increased performance (as measured by query return times) in some cases (though I acknowledge the code I'm working with has some serious issues with how ineffectively it is actually utilizing accumulo, but that's an issue I intend to address later). I suppose one way would be between a change of node sizes, stop and restart ALL nodes ( as opposed to what I'd been doing in just killing 2 nodes for example in transitioning from a 10 to 8 node test). Will this be sure to clear the influence of caching across runs, and is there any cleaner way to do this? thanks, Steve
-
Re: Accumulo Caching for benchmarkingWilliam Slacum 2012-08-04, 00:55
Steve, I'm a little confused. The Rfile block cache is tied to a TServer,
so if you kill a node, its cache should go away. Are you querying for the same data after you kill the node that hosted the tablet which contained the data? Also, between runs, you could stop and restart everything, thereby eliminating the cache. On Fri, Aug 3, 2012 at 5:50 PM, Steven Troxell <[EMAIL PROTECTED]>wrote: > Hi all, > > I am running a benchmarking project on accumulo looking at RDF queries for > clusters with different node sizes. While I intend to look at caching for > each optimizing each individual run, I do NOT want caching to interfere for > example between runs involving the use of 10 and 8 tablet servers. > > Up to now I'd just been killing nodes via the bin/stop-here.sh script but > I realize that may have allowed caching from previous runs with different > node sizes to influence my results. It seemed weird to me for exmaple > when I realized dropping nodes actually increased performance (as measured > by query return times) in some cases (though I acknowledge the code I'm > working with has some serious issues with how ineffectively it is actually > utilizing accumulo, but that's an issue I intend to address later). > > I suppose one way would be between a change of node sizes, stop and > restart ALL nodes ( as opposed to what I'd been doing in just killing 2 > nodes for example in transitioning from a 10 to 8 node test). Will this be > sure to clear the influence of caching across runs, and is there any > cleaner way to do this? > > thanks, > Steve >
-
Re: Accumulo Caching for benchmarkingJosh Elser 2012-08-04, 00:59
I remember listening to a Keith presentation about the testing against
the multi-level RFile index which was introduced in 1.4.0. You also want to think about caching at the operating system level. I'm not entirely positive what Keith did to try to mitigate this, but I imagine writing a bunch of garbage from /dev/urandom out to disk should work. That, or you could actually reboot the nodes. On 8/3/2012 8:55 PM, William Slacum wrote: > Steve, I'm a little confused. The Rfile block cache is tied to a > TServer, so if you kill a node, its cache should go away. Are you > querying for the same data after you kill the node that hosted the > tablet which contained the data? Also, between runs, you could stop and > restart everything, thereby eliminating the cache. > > On Fri, Aug 3, 2012 at 5:50 PM, Steven Troxell <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > Hi all, > > I am running a benchmarking project on accumulo looking at RDF > queries for clusters with different node sizes. While I intend to > look at caching for each optimizing each individual run, I do NOT > want caching to interfere for example between runs involving the use > of 10 and 8 tablet servers. > > Up to now I'd just been killing nodes via the bin/stop-here.sh > script but I realize that may have allowed caching from previous > runs with different node sizes to influence my results. It seemed > weird to me for exmaple when I realized dropping nodes actually > increased performance (as measured by query return times) in some > cases (though I acknowledge the code I'm working with has some > serious issues with how ineffectively it is actually utilizing > accumulo, but that's an issue I intend to address later). > > I suppose one way would be between a change of node sizes, stop and > restart ALL nodes ( as opposed to what I'd been doing in just > killing 2 nodes for example in transitioning from a 10 to 8 node > test). Will this be sure to clear the influence of caching across > runs, and is there any cleaner way to do this? > > thanks, > Steve > >
-
Re: Accumulo Caching for benchmarkingChristopher Tubbs 2012-08-04, 01:41
Steve-
I would probably design the experiment to test different cluster sizes as completely independent. That means, taking the entire thing down and back up again (possibly even rebooting the boxes, and/or re-initializing the cluster at the new size). I'd also do several runs while it is up at a particular cluster size, to capture any performance difference between the first and a later run due to OS or TServer caching, for analysis later. Essentially, when in doubt, take more data... --L On Fri, Aug 3, 2012 at 5:50 PM, Steven Troxell <[EMAIL PROTECTED]> wrote: > Hi all, > > I am running a benchmarking project on accumulo looking at RDF queries for > clusters with different node sizes. While I intend to look at caching for > each optimizing each individual run, I do NOT want caching to interfere for > example between runs involving the use of 10 and 8 tablet servers. > > Up to now I'd just been killing nodes via the bin/stop-here.sh script but I > realize that may have allowed caching from previous runs with different node > sizes to influence my results. It seemed weird to me for exmaple when I > realized dropping nodes actually increased performance (as measured by query > return times) in some cases (though I acknowledge the code I'm working with > has some serious issues with how ineffectively it is actually utilizing > accumulo, but that's an issue I intend to address later). > > I suppose one way would be between a change of node sizes, stop and restart > ALL nodes ( as opposed to what I'd been doing in just killing 2 nodes for > example in transitioning from a 10 to 8 node test). Will this be sure to > clear the influence of caching across runs, and is there any cleaner way to > do this? > > thanks, > Steve
-
Re: Accumulo Caching for benchmarkingEric Newton 2012-08-04, 11:19
You can drop the OS caches between runs:
# echo 1 > /proc/sys/vm/drop_caches On Fri, Aug 3, 2012 at 9:41 PM, Christopher Tubbs <[EMAIL PROTECTED]>wrote: > Steve- > > I would probably design the experiment to test different cluster sizes > as completely independent. That means, taking the entire thing down > and back up again (possibly even rebooting the boxes, and/or > re-initializing the cluster at the new size). I'd also do several runs > while it is up at a particular cluster size, to capture any > performance difference between the first and a later run due to OS or > TServer caching, for analysis later. > > Essentially, when in doubt, take more data... > > --L > > > On Fri, Aug 3, 2012 at 5:50 PM, Steven Troxell <[EMAIL PROTECTED]> > wrote: > > Hi all, > > > > I am running a benchmarking project on accumulo looking at RDF queries > for > > clusters with different node sizes. While I intend to look at caching > for > > each optimizing each individual run, I do NOT want caching to interfere > for > > example between runs involving the use of 10 and 8 tablet servers. > > > > Up to now I'd just been killing nodes via the bin/stop-here.sh script > but I > > realize that may have allowed caching from previous runs with different > node > > sizes to influence my results. It seemed weird to me for exmaple when I > > realized dropping nodes actually increased performance (as measured by > query > > return times) in some cases (though I acknowledge the code I'm working > with > > has some serious issues with how ineffectively it is actually utilizing > > accumulo, but that's an issue I intend to address later). > > > > I suppose one way would be between a change of node sizes, stop and > restart > > ALL nodes ( as opposed to what I'd been doing in just killing 2 nodes for > > example in transitioning from a 10 to 8 node test). Will this be sure to > > clear the influence of caching across runs, and is there any cleaner way > to > > do this? > > > > thanks, > > Steve >
-
Re: Accumulo Caching for benchmarkingSteven Troxell 2012-08-04, 17:21
thanks everyone, that should definately help me out, while I feel silly
for ignoring this issue at first, it should be interesting to see how much this influences the results. On Sat, Aug 4, 2012 at 7:19 AM, Eric Newton <[EMAIL PROTECTED]> wrote: > You can drop the OS caches between runs: > > # echo 1 > /proc/sys/vm/drop_caches > > > On Fri, Aug 3, 2012 at 9:41 PM, Christopher Tubbs <[EMAIL PROTECTED]>wrote: > >> Steve- >> >> I would probably design the experiment to test different cluster sizes >> as completely independent. That means, taking the entire thing down >> and back up again (possibly even rebooting the boxes, and/or >> re-initializing the cluster at the new size). I'd also do several runs >> while it is up at a particular cluster size, to capture any >> performance difference between the first and a later run due to OS or >> TServer caching, for analysis later. >> >> Essentially, when in doubt, take more data... >> >> --L >> >> >> On Fri, Aug 3, 2012 at 5:50 PM, Steven Troxell <[EMAIL PROTECTED]> >> wrote: >> > Hi all, >> > >> > I am running a benchmarking project on accumulo looking at RDF queries >> for >> > clusters with different node sizes. While I intend to look at caching >> for >> > each optimizing each individual run, I do NOT want caching to interfere >> for >> > example between runs involving the use of 10 and 8 tablet servers. >> > >> > Up to now I'd just been killing nodes via the bin/stop-here.sh script >> but I >> > realize that may have allowed caching from previous runs with different >> node >> > sizes to influence my results. It seemed weird to me for exmaple when >> I >> > realized dropping nodes actually increased performance (as measured by >> query >> > return times) in some cases (though I acknowledge the code I'm working >> with >> > has some serious issues with how ineffectively it is actually utilizing >> > accumulo, but that's an issue I intend to address later). >> > >> > I suppose one way would be between a change of node sizes, stop and >> restart >> > ALL nodes ( as opposed to what I'd been doing in just killing 2 nodes >> for >> > example in transitioning from a 10 to 8 node test). Will this be sure >> to >> > clear the influence of caching across runs, and is there any cleaner >> way to >> > do this? >> > >> > thanks, >> > Steve >> > >
-
Re: Accumulo Caching for benchmarkingSteven Troxell 2012-08-06, 18:41
For anyone else curious about this, it seems the OS caching played a much
larger role for me then TServer caching. I actually measured performance increase after just stopping/restarting TServers to clear cache. (could also have been biased by being a weekend run on the cluster). However I noticed immediate difference when clearing the OS caching through Eric's commands, the first few querys that had generally been returning in tenths of seconds, were now up in the minutes range. On Sat, Aug 4, 2012 at 1:21 PM, Steven Troxell <[EMAIL PROTECTED]>wrote: > thanks everyone, that should definately help me out, while I feel silly > for ignoring this issue at first, it should be interesting to see how much > this influences the results. > > > > On Sat, Aug 4, 2012 at 7:19 AM, Eric Newton <[EMAIL PROTECTED]> wrote: > >> You can drop the OS caches between runs: >> >> # echo 1 > /proc/sys/vm/drop_caches >> >> >> On Fri, Aug 3, 2012 at 9:41 PM, Christopher Tubbs <[EMAIL PROTECTED]>wrote: >> >>> Steve- >>> >>> I would probably design the experiment to test different cluster sizes >>> as completely independent. That means, taking the entire thing down >>> and back up again (possibly even rebooting the boxes, and/or >>> re-initializing the cluster at the new size). I'd also do several runs >>> while it is up at a particular cluster size, to capture any >>> performance difference between the first and a later run due to OS or >>> TServer caching, for analysis later. >>> >>> Essentially, when in doubt, take more data... >>> >>> --L >>> >>> >>> On Fri, Aug 3, 2012 at 5:50 PM, Steven Troxell <[EMAIL PROTECTED]> >>> wrote: >>> > Hi all, >>> > >>> > I am running a benchmarking project on accumulo looking at RDF queries >>> for >>> > clusters with different node sizes. While I intend to look at >>> caching for >>> > each optimizing each individual run, I do NOT want caching to >>> interfere for >>> > example between runs involving the use of 10 and 8 tablet servers. >>> > >>> > Up to now I'd just been killing nodes via the bin/stop-here.sh script >>> but I >>> > realize that may have allowed caching from previous runs with >>> different node >>> > sizes to influence my results. It seemed weird to me for exmaple >>> when I >>> > realized dropping nodes actually increased performance (as measured by >>> query >>> > return times) in some cases (though I acknowledge the code I'm working >>> with >>> > has some serious issues with how ineffectively it is actually utilizing >>> > accumulo, but that's an issue I intend to address later). >>> > >>> > I suppose one way would be between a change of node sizes, stop and >>> restart >>> > ALL nodes ( as opposed to what I'd been doing in just killing 2 nodes >>> for >>> > example in transitioning from a 10 to 8 node test). Will this be sure >>> to >>> > clear the influence of caching across runs, and is there any cleaner >>> way to >>> > do this? >>> > >>> > thanks, >>> > Steve >>> >> >> >
-
Re: Accumulo Caching for benchmarkingSteven Troxell 2012-08-07, 14:57
Are there other considerations I should be aware of to ensure independent
runs outside of stopping/restarting tablet servers and clearing OS cache? I ran a test with 2 tablet servers active, got 1 query to come back in 10 hours. Ran /bin/stop-all and ./bin/start-all to get a comparison test with 10 tservers, cleared the cache using Eric's command on the 2 tablet servers I had used for the first run before, and now I already had 4 queries return in under 2 minutes. This could be awesome peformance gains, but I'm a bit skeptical, especially considering the client code isn't even using batchscans (as well as assorted other inefficiencies). Is there some other dependency between the tests I haven't accounted for? On Mon, Aug 6, 2012 at 2:41 PM, Steven Troxell <[EMAIL PROTECTED]>wrote: > For anyone else curious about this, it seems the OS caching played a much > larger role for me then TServer caching. I actually measured performance > increase after just stopping/restarting TServers to clear cache. (could > also have been biased by being a weekend run on the cluster). > > However I noticed immediate difference when clearing the OS caching > through Eric's commands, the first few querys that had generally been > returning in tenths of seconds, were now up in the minutes range. > > > > > On Sat, Aug 4, 2012 at 1:21 PM, Steven Troxell <[EMAIL PROTECTED]>wrote: > >> thanks everyone, that should definately help me out, while I feel silly >> for ignoring this issue at first, it should be interesting to see how much >> this influences the results. >> >> >> >> On Sat, Aug 4, 2012 at 7:19 AM, Eric Newton <[EMAIL PROTECTED]>wrote: >> >>> You can drop the OS caches between runs: >>> >>> # echo 1 > /proc/sys/vm/drop_caches >>> >>> >>> On Fri, Aug 3, 2012 at 9:41 PM, Christopher Tubbs <[EMAIL PROTECTED]>wrote: >>> >>>> Steve- >>>> >>>> I would probably design the experiment to test different cluster sizes >>>> as completely independent. That means, taking the entire thing down >>>> and back up again (possibly even rebooting the boxes, and/or >>>> re-initializing the cluster at the new size). I'd also do several runs >>>> while it is up at a particular cluster size, to capture any >>>> performance difference between the first and a later run due to OS or >>>> TServer caching, for analysis later. >>>> >>>> Essentially, when in doubt, take more data... >>>> >>>> --L >>>> >>>> >>>> On Fri, Aug 3, 2012 at 5:50 PM, Steven Troxell < >>>> [EMAIL PROTECTED]> wrote: >>>> > Hi all, >>>> > >>>> > I am running a benchmarking project on accumulo looking at RDF >>>> queries for >>>> > clusters with different node sizes. While I intend to look at >>>> caching for >>>> > each optimizing each individual run, I do NOT want caching to >>>> interfere for >>>> > example between runs involving the use of 10 and 8 tablet servers. >>>> > >>>> > Up to now I'd just been killing nodes via the bin/stop-here.sh script >>>> but I >>>> > realize that may have allowed caching from previous runs with >>>> different node >>>> > sizes to influence my results. It seemed weird to me for exmaple >>>> when I >>>> > realized dropping nodes actually increased performance (as measured >>>> by query >>>> > return times) in some cases (though I acknowledge the code I'm >>>> working with >>>> > has some serious issues with how ineffectively it is actually >>>> utilizing >>>> > accumulo, but that's an issue I intend to address later). >>>> > >>>> > I suppose one way would be between a change of node sizes, stop and >>>> restart >>>> > ALL nodes ( as opposed to what I'd been doing in just killing 2 nodes >>>> for >>>> > example in transitioning from a 10 to 8 node test). Will this be >>>> sure to >>>> > clear the influence of caching across runs, and is there any cleaner >>>> way to >>>> > do this? >>>> > >>>> > thanks, >>>> > Steve >>>> >>> >>> >> >
-
Re: Accumulo Caching for benchmarkingEric Newton 2012-08-07, 16:47
Index caching is on by default in 1.4, and it's not particularly large.
So, if your index suddenly fit entirely in cache with 10 servers, you would see much better performance. -Eric On Tue, Aug 7, 2012 at 10:57 AM, Steven Troxell <[EMAIL PROTECTED]>wrote: > Are there other considerations I should be aware of to ensure independent > runs outside of stopping/restarting tablet servers and clearing OS cache? > > I ran a test with 2 tablet servers active, got 1 query to come back in 10 > hours. Ran /bin/stop-all and ./bin/start-all to get a comparison test > with 10 tservers, cleared the cache using Eric's command on the 2 tablet > servers I had used for the first run before, and now I already had 4 > queries return in under 2 minutes. > > This could be awesome peformance gains, but I'm a bit skeptical, > especially considering the client code isn't even using batchscans (as well > as assorted other inefficiencies). > > Is there some other dependency between the tests I haven't accounted for? > > > On Mon, Aug 6, 2012 at 2:41 PM, Steven Troxell <[EMAIL PROTECTED]>wrote: > >> For anyone else curious about this, it seems the OS caching played a much >> larger role for me then TServer caching. I actually measured performance >> increase after just stopping/restarting TServers to clear cache. (could >> also have been biased by being a weekend run on the cluster). >> >> However I noticed immediate difference when clearing the OS caching >> through Eric's commands, the first few querys that had generally been >> returning in tenths of seconds, were now up in the minutes range. >> >> >> >> >> On Sat, Aug 4, 2012 at 1:21 PM, Steven Troxell <[EMAIL PROTECTED]>wrote: >> >>> thanks everyone, that should definately help me out, while I feel silly >>> for ignoring this issue at first, it should be interesting to see how much >>> this influences the results. >>> >>> >>> >>> On Sat, Aug 4, 2012 at 7:19 AM, Eric Newton <[EMAIL PROTECTED]>wrote: >>> >>>> You can drop the OS caches between runs: >>>> >>>> # echo 1 > /proc/sys/vm/drop_caches >>>> >>>> >>>> On Fri, Aug 3, 2012 at 9:41 PM, Christopher Tubbs <[EMAIL PROTECTED]>wrote: >>>> >>>>> Steve- >>>>> >>>>> I would probably design the experiment to test different cluster sizes >>>>> as completely independent. That means, taking the entire thing down >>>>> and back up again (possibly even rebooting the boxes, and/or >>>>> re-initializing the cluster at the new size). I'd also do several runs >>>>> while it is up at a particular cluster size, to capture any >>>>> performance difference between the first and a later run due to OS or >>>>> TServer caching, for analysis later. >>>>> >>>>> Essentially, when in doubt, take more data... >>>>> >>>>> --L >>>>> >>>>> >>>>> On Fri, Aug 3, 2012 at 5:50 PM, Steven Troxell < >>>>> [EMAIL PROTECTED]> wrote: >>>>> > Hi all, >>>>> > >>>>> > I am running a benchmarking project on accumulo looking at RDF >>>>> queries for >>>>> > clusters with different node sizes. While I intend to look at >>>>> caching for >>>>> > each optimizing each individual run, I do NOT want caching to >>>>> interfere for >>>>> > example between runs involving the use of 10 and 8 tablet servers. >>>>> > >>>>> > Up to now I'd just been killing nodes via the bin/stop-here.sh >>>>> script but I >>>>> > realize that may have allowed caching from previous runs with >>>>> different node >>>>> > sizes to influence my results. It seemed weird to me for exmaple >>>>> when I >>>>> > realized dropping nodes actually increased performance (as measured >>>>> by query >>>>> > return times) in some cases (though I acknowledge the code I'm >>>>> working with >>>>> > has some serious issues with how ineffectively it is actually >>>>> utilizing >>>>> > accumulo, but that's an issue I intend to address later). >>>>> > >>>>> > I suppose one way would be between a change of node sizes, stop and >>>>> restart >>>>> > ALL nodes ( as opposed to what I'd been doing in just killing 2
-
Re: Accumulo Caching for benchmarkingSteven Troxell 2012-08-07, 16:53
And I imagine index caching would be cleared out by stopping/starting
tservers? Cause some of those near instant hits disappeared when I went from 10 nodes to 2, but reappeared when I went from 2 to 10. So perhaps it is a legit performance increase if I'm understanding you correctly? (As opposed to having a dependancy between runs with different # of nodes that I missed). I would also then assume that an increase in data ingest size would be relevant to further testing past the scale where index caching suddenly accounts for the increase? On Tue, Aug 7, 2012 at 12:47 PM, Eric Newton <[EMAIL PROTECTED]> wrote: > Index caching is on by default in 1.4, and it's not particularly large. > So, if your index suddenly fit entirely in cache with 10 servers, you > would see much better performance. > > -Eric > > > On Tue, Aug 7, 2012 at 10:57 AM, Steven Troxell <[EMAIL PROTECTED]>wrote: > >> Are there other considerations I should be aware of to ensure independent >> runs outside of stopping/restarting tablet servers and clearing OS cache? >> >> I ran a test with 2 tablet servers active, got 1 query to come back in >> 10 hours. Ran /bin/stop-all and ./bin/start-all to get a comparison test >> with 10 tservers, cleared the cache using Eric's command on the 2 tablet >> servers I had used for the first run before, and now I already had 4 >> queries return in under 2 minutes. >> >> This could be awesome peformance gains, but I'm a bit skeptical, >> especially considering the client code isn't even using batchscans (as well >> as assorted other inefficiencies). >> >> Is there some other dependency between the tests I haven't accounted for? >> >> >> On Mon, Aug 6, 2012 at 2:41 PM, Steven Troxell <[EMAIL PROTECTED]>wrote: >> >>> For anyone else curious about this, it seems the OS caching played a >>> much larger role for me then TServer caching. I actually measured >>> performance increase after just stopping/restarting TServers to clear >>> cache. (could also have been biased by being a weekend run on the >>> cluster). >>> >>> However I noticed immediate difference when clearing the OS caching >>> through Eric's commands, the first few querys that had generally been >>> returning in tenths of seconds, were now up in the minutes range. >>> >>> >>> >>> >>> On Sat, Aug 4, 2012 at 1:21 PM, Steven Troxell <[EMAIL PROTECTED] >>> > wrote: >>> >>>> thanks everyone, that should definately help me out, while I feel >>>> silly for ignoring this issue at first, it should be interesting to see how >>>> much this influences the results. >>>> >>>> >>>> >>>> On Sat, Aug 4, 2012 at 7:19 AM, Eric Newton <[EMAIL PROTECTED]>wrote: >>>> >>>>> You can drop the OS caches between runs: >>>>> >>>>> # echo 1 > /proc/sys/vm/drop_caches >>>>> >>>>> >>>>> On Fri, Aug 3, 2012 at 9:41 PM, Christopher Tubbs <[EMAIL PROTECTED]>wrote: >>>>> >>>>>> Steve- >>>>>> >>>>>> I would probably design the experiment to test different cluster sizes >>>>>> as completely independent. That means, taking the entire thing down >>>>>> and back up again (possibly even rebooting the boxes, and/or >>>>>> re-initializing the cluster at the new size). I'd also do several runs >>>>>> while it is up at a particular cluster size, to capture any >>>>>> performance difference between the first and a later run due to OS or >>>>>> TServer caching, for analysis later. >>>>>> >>>>>> Essentially, when in doubt, take more data... >>>>>> >>>>>> --L >>>>>> >>>>>> >>>>>> On Fri, Aug 3, 2012 at 5:50 PM, Steven Troxell < >>>>>> [EMAIL PROTECTED]> wrote: >>>>>> > Hi all, >>>>>> > >>>>>> > I am running a benchmarking project on accumulo looking at RDF >>>>>> queries for >>>>>> > clusters with different node sizes. While I intend to look at >>>>>> caching for >>>>>> > each optimizing each individual run, I do NOT want caching to >>>>>> interfere for >>>>>> > example between runs involving the use of 10 and 8 tablet servers. >>>>>> > >>>>>> > Up to now I'd just been killing nodes via the bin/stop-here.sh |