On 16 January 2012 17:36, Patrick Hunt <[EMAIL PROTECTED]> wrote:
> On Sun, Jan 15, 2012 at 11:39 PM, Henry Robinson <[EMAIL PROTECTED]>
> > Hi -
> > The unit tests are taking longer and longer to run, particularly
> locally. I
> > was poking about looking for some easy wins, and I noticed that a lot of
> > the time is spent waiting for servers to come up, which is heavily
> > dependent on the tick time. Lo and behold, dropping the tick time on (for
> > example) QuorumPeerMainTest from 4s to 100ms made the test suite quicker
> > 30s.
> > On builds.apache.org it's not a great idea to reduce the tick time too
> > because it generally runs on more contended hardware so timeouts get hit,
> > but what if we just increase the session expiration time commensurately?
> > could set a 500ms tick time with a 30s (or more) max session expiration
> > time. Latencies due to waiting for servers to start should be lower, but
> > the tests should remain as stable.
> > Any thoughts? Any other ways we can tighten up the test suite runtime?
> I'd be concerned that we were testing with a different setting than
> most users set. Would we be more or less likely to find issues by
> setting this lower?
That's a good point, but I don't know that we can really say the tests as
they stand are at all representative of what real users are doing. The unit
tests have ensembles co-located on the same machine, with very synthetic
workloads - I don't think they mimic production environments, nor should
Many of the tests start a cluster and then wait for some condition to be
true, or a timeout to occur (then aborting). I'm suggesting keeping most of
the timeouts to be similar lengths, but to poll more frequently so that we
don't waste time waiting to wake up to check the condition, if that makes
I like your idea of splitting tests into categories. I think a lot of the
current tests should exist in the test-commit category but currently take a
bit long to run. The 'hammer' tests are great examples of tests that should
be in the full suite, since they're not really testing for a specific
property, but the QuorumPeerMain tests are mostly testing a very specific
I filed ZOOKEEPER-1363 to deal with splitting the tests up by category
(ZOOKEEPER-725, the only other place I saw this mentioned, is a bit more
> re "other ways":
> In the past I've found that test time reductions could be had by
> looking at the longest running tests for flaws. Often a test will set
> a session time of 30seconds and wait for expiration, or sleep for some
> long/unnecessary period of time. I'll typically refactor the test to
> improve the runtime. In past releases I've made significant
> improvements using this method (perhaps mined out?)
> Another option is to restart the server(s) less frequently. This can
> be done by starting the service once for all tests in a class, rather
> than for each test method. (non-optimal though)
> Others would probably point out that what we call "unit tests" are
> pretty much system tests and should be moved out. That seems unlikely
> at this time however.
> Given that tests typically increase in scope (and time) and not
> decrease we might want to consider moving to the approach that Pig and
> some other projects have. They have test targets that run a subset of
> the test suite. For example in Pig "test" takes 6-8 hrs, however they
> have a "test-commit" which only takes 20min or so. We could do similar
> for ZK. This is easy to do using "exclude files". (see pig build.xml)
> IMO long term we should categorize our tests, asking ppl to run
> "test-commit" (short subset) prior to committing, whereas "test" (full
> suite) would run as part of the patch testing, nightly testing,
> release testing, etc...