|
|
-
Re: Performance TestingMatt Corgan 2012-06-21, 22:05
just brainstorming =)
Some of those are motivated by the performance tests i wrote for data block encoding: Link<https://github.com/hotpads/hbase-prefix-trie/tree/master/test/org/apache/hadoop/hbase/cell/pt/test/performance/seek>. In that directory: * SeekBenchmarkMain gathers all of the test parameters. Perhaps we could have a test configuration input file format where standard test configs are put in source control * For each combination of input parameters it runs a SingleSeekBenchmark * As it runs, the SingleSeekBenchmark adds results to a SeekBenchmarkResult * Each SeekBenchmarkResult is logged after each SingleSeekBenchmark, and all of them are logged again at the end for pasting into a spreadsheet They're probably too customized to my use case, but maybe we can draw ideas from the structure/workflow and make it applicable to more use cases. On Thu, Jun 21, 2012 at 2:47 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Concur. That's ambitious! > > On Thu, Jun 21, 2012 at 1:57 PM, Ryan Ausanka-Crues > <[EMAIL PROTECTED]> wrote: > > Thanks Matt. These are great! > > --- > > Ryan Ausanka-Crues > > CEO > > Palomino Labs, Inc. > > [EMAIL PROTECTED] > > (m) 805.242.2486 > > > > On Jun 21, 2012, at 12:36 PM, Matt Corgan wrote: > > > >> These are geared more towards development than regression testing, but > here > >> are a few ideas that I would find useful: > >> > >> * Ability to run the performance tests (or at least a subset of them) > on a > >> development machine would help people avoid committing regressions and > >> would speed development in general > >> * Ability to test a single region without heavier weight servers and > >> clusters > >> * Letting the test run with multiple combinations of input parameters > >> (block size, compression, blooms, encoding, flush size, etc, etc). > >> Possibly many combinations that could take a while to run > >> * Output results to a CSV file that's importable to a spreadsheet for > >> sorting/filtering/charting. > >> * Email the CSV file to the user notifying them the tests have finished. > >> * Getting fancier: ability to specify a list of branches or tags from > git > >> or subversion as inputs, which would allow the developer to tag many > >> different performance changes and later figure out which combination is > the > >> best (all before submitting a patch) > >> > >> > >> On Thu, Jun 21, 2012 at 12:13 PM, Elliott Clark <[EMAIL PROTECTED] > >wrote: > >> > >>> I actually think that more measurements are needed than just per > release. > >>> The best I could hope for would be a four node+ cluster(One master and > >>> three slaves) that for every check in on trunk run multiple different > perf > >>> tests. > >>> > >>> > >>> - All Reads (Scans) > >>> - Large Writes (Should test compactions/flushes) > >>> - Read Dominated with 10% writes > >>> > >>> Then every checkin can be evaluated and large regressions can be > treated as > >>> bugs. And with that we can see the difference between the different > >>> versions as well. http://arewefastyet.com/ is kind of the model that I > >>> would love to see. And I'm more than willing to help where ever > needed. > >>> > >>> However in reality every night will probably be more feasible. And > Four > >>> nodes is probably not going to happen either. > >>> > >>> On Thu, Jun 21, 2012 at 11:38 AM, Andrew Purtell <[EMAIL PROTECTED] > >>>> wrote: > >>> > >>>> On Wed, Jun 20, 2012 at 10:37 PM, Ryan Ausanka-Crues > >>>> <[EMAIL PROTECTED]> wrote: > >>>>> I think it makes sense to start by defining the goals for the > >>>> performance testing project and then deciding what we'd like to > >>> accomplish. > >>>> As such, I start by soliciting ideas from everyone on what they would > >>> like > >>>> to see from the project. We can then collate those thoughts and > >>> prioritize > >>>> the different features. Does that sound like a reasonable approach? > >>>> > >>>> In terms of defining a goal, the fundamental need I see for us as a |