Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Performance Testing


Copy link to this message
-
Re: Performance Testing
just brainstorming =)

Some of those are motivated by the performance tests i wrote for data block
encoding: Link<https://github.com/hotpads/hbase-prefix-trie/tree/master/test/org/apache/hadoop/hbase/cell/pt/test/performance/seek>.
 In that directory:

* SeekBenchmarkMain gathers all of the test parameters.  Perhaps we could
have a test configuration input file format where standard test configs are
put in source control
* For each combination of input parameters it runs a SingleSeekBenchmark
* As it runs, the SingleSeekBenchmark adds results to a SeekBenchmarkResult
* Each SeekBenchmarkResult is logged after each SingleSeekBenchmark, and
all of them are logged again at the end for pasting into a spreadsheet

They're probably too customized to my use case, but maybe we can draw ideas
from the structure/workflow and make it applicable to more use cases.
On Thu, Jun 21, 2012 at 2:47 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:

> Concur. That's ambitious!
>
> On Thu, Jun 21, 2012 at 1:57 PM, Ryan Ausanka-Crues
> <[EMAIL PROTECTED]> wrote:
> > Thanks Matt. These are great!
> > ---
> > Ryan Ausanka-Crues
> > CEO
> > Palomino Labs, Inc.
> > [EMAIL PROTECTED]
> > (m) 805.242.2486
> >
> > On Jun 21, 2012, at 12:36 PM, Matt Corgan wrote:
> >
> >> These are geared more towards development than regression testing, but
> here
> >> are a few ideas that I would find useful:
> >>
> >> * Ability to run the performance tests (or at least a subset of them)
> on a
> >> development machine would help people avoid committing regressions and
> >> would speed development in general
> >> * Ability to test a single region without heavier weight servers and
> >> clusters
> >> * Letting the test run with multiple combinations of input parameters
> >> (block size, compression, blooms, encoding, flush size, etc, etc).
> >> Possibly many combinations that could take a while to run
> >> * Output results to a CSV file that's importable to a spreadsheet for
> >> sorting/filtering/charting.
> >> * Email the CSV file to the user notifying them the tests have finished.
> >> * Getting fancier: ability to specify a list of branches or tags from
> git
> >> or subversion as inputs, which would allow the developer to tag many
> >> different performance changes and later figure out which combination is
> the
> >> best (all before submitting a patch)
> >>
> >>
> >> On Thu, Jun 21, 2012 at 12:13 PM, Elliott Clark <[EMAIL PROTECTED]
> >wrote:
> >>
> >>> I actually think that more measurements are needed than just per
> release.
> >>> The best I could hope for would be a four node+ cluster(One master and
> >>> three slaves) that for every check in on trunk run multiple different
> perf
> >>> tests.
> >>>
> >>>
> >>>  - All Reads (Scans)
> >>>  - Large Writes (Should test compactions/flushes)
> >>>  - Read Dominated with 10% writes
> >>>
> >>> Then every checkin can be evaluated and large regressions can be
> treated as
> >>> bugs.  And with that we can see the difference between the different
> >>> versions as well. http://arewefastyet.com/ is kind of the model that I
> >>> would love to see.  And I'm more than willing to help where ever
> needed.
> >>>
> >>> However in reality every night will probably be more feasible.   And
> Four
> >>> nodes is probably not going to happen either.
> >>>
> >>> On Thu, Jun 21, 2012 at 11:38 AM, Andrew Purtell <[EMAIL PROTECTED]
> >>>> wrote:
> >>>
> >>>> On Wed, Jun 20, 2012 at 10:37 PM, Ryan Ausanka-Crues
> >>>> <[EMAIL PROTECTED]> wrote:
> >>>>> I think it makes sense to start by defining the goals for the
> >>>> performance testing project and then deciding what we'd like to
> >>> accomplish.
> >>>> As such, I start by soliciting ideas from everyone on what they would
> >>> like
> >>>> to see from the project. We can then collate those thoughts and
> >>> prioritize
> >>>> the different features. Does that sound like a reasonable approach?
> >>>>
> >>>> In terms of defining a goal, the fundamental need I see for us as a
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB