The behavior we saw on one of our zookeeper clients is as follows. The
session expires on the client, it assumes the ephemeral nodes are deleted,
so it establishes a new session with zookeeper and tries to re-create the
ephemeral nodes. However, when it tries to re-create the ephemeral node,
zookeeper throws back a NodeExists error code. Now this is legitimate
during a session disconnect event (since zkclient automatically retries the
operation and raises a NodeExists error). Also by design, Kafka doesn't
have multiple clients create the same ephemeral node, so Kafka server
assumes the NodeExists is normal. However, after a few seconds zookeeper
deletes that ephemeral node. So from the client's perspective, even though
the client has a new valid session, its ephemeral node is gone.
After poking at the transaction and log4j logs, I saw that the NodeExists
was because the zookeeper leader had retained the ephemeral node from the
previous expired session. It turns out that it notified the client of the
session expiration before actually deleting the ephemeral node. It is also
worth noting that the previous session was expired due to a long fsync
operation on the zookeeper leader. After it returned from the fsync, it had
a whole bunch of sessions to expire.
In this case, it seems that zookeeper should not notify the client that the
session is expired until the ephemeral node information is actually gone.
Or maybe I'm not clear on what the guarantees from zookeeper are, across
sessions from the same client.