We had to do similar stuff internally at LinkedIn and most of the bugs we
found were in the way session expiry/disconnect handling. We did a
combination of iptables, SIGSTOP and having another client connect with
same session id/password and close that connection. This is non trivial and
requires some effort to wire up different pieces.
However I would like to add that the even though our test cases worked we
had weird issues during GC's and some times during long GC. GC on both
server and client are problematic. For example clients would get a session
expiry and then a syncconnected event but before syncconnected is processed
there would be another session expiry. These scenarios are much harder to
test for and reproduce.
Thanks for taking this up.
On Thu, Jun 27, 2013 at 10:38 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> +1 this is a very big deal
> On Thu, Jun 27, 2013 at 6:39 PM, Thawan Kooburat <[EMAIL PROTECTED]> wrote:
> > Many recent issues that I saw internally is due to incorrect handling or
> > no sufficient testing on ZooKeeper failure scenario in the custom wrapper
> > API or in the applications.
> > I am thinking that we might be able to expose a few more API calls that
> > allow user write unit tests that cover various failure scenarios (similar
> > to the TestableZookeer in zookeeper test) . This should also minimize the
> > effort on setting the test framework. Ideally, if we have a mock client
> > that don't need a running the server that would be ideal, but I think it
> > too much effort to write and maintain for all the languages. Our internal
> > test facility is that we have a dedicated ensemble used by all unit
> > This ensure application logic correctness but it is hard to test various
> > failure scenarios.
> > So my current thought is to expose the following functionalities.
> > 1. zookeeper_close() that don't actually send close request to the
> > server: This can be used to simulate a client crash without actually
> > crashing the test program.
> > 2. Allow client to force triggering CONNECTION_LOSS or SESSSION_EXPIRE
> > event: This will allow the user to test their watchers and callback
> > possible race condition)
> > Let me know if you have additional suggestions.
> > --
> > Thawan Kooburat