Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Unexpected behavior with Session Timeouts in Java Client

Copy link to this message
Re: Unexpected behavior with Session Timeouts in Java Client
i think the perspective to have is that zookeeper tries to deal with
facts, and when it doesn't have the facts, it tells you so. when a
client loses a connection to zookeeper it does a callback to let your
application know that it doesn't know the state of the system anymore.
when it reconnects, it tells you that it now knows the system state
and informs of changes.

we have to use timeouts for failure detection, but we don't use
timeouts for figuring out facts. that is why we wait to reconnect for
the session timeout. if you want to use timeouts for session
expiration, you can do it yourself by starting a timer on the
disconnect event and then do an explicit close() when the timer fires.

one thing to keep in mind this delayed session expiration event is
only relevant to the disconnected client. the session itself will get
killed and a new leader elected in the meantime.

for your situation, i wouldn't kill the external connection until the
session expired event, but i would stop consuming data from the
connection while disconnected. i imagine you have some sort of
acknowledgement or consume mechanism for flow control and packet loss
that you can use.


On Thu, Apr 21, 2011 at 3:41 PM, Scott Fines <[EMAIL PROTECTED]> wrote:
> Perhaps I am not being clear in my description.
> I'm building a system, that receives data events from an external source.
> This external source is not necessarily under my control, and starting up a
> connection to that source is highly expensive, as it entails a high-latency,
> low-bandwith transfer of data. It's more stable than packet radio, but not a
> whole lot faster. This system retains the ability to recreate events, and
> can do so upon request, but the cost to recreate is extremely high.
> On the other end, the system pushes these events on to distributed
> processing and (eventually) a long-term storage situation, like Cassandra or
> Hadoop. Each event that is received can be idempotently applied, so the
> system can safely process duplicate messages if they come in.
> If the world were perfect and we had Infiniband connections to all of our
> external sources, then there would be no reason for a leader-election
> protocol in this scenario. I would just boot the system on every node, and
> have them do their thing, and why worry? Idempotency is a beautiful thing.
> Sadly, the world is not perfect, and trying to deal with an already slow
> external connection by asking it to send the same data 10 or 15 times is not
> a great idea, performance-wise. In addition to slowing everything down on
> the receiving end, it also has an adverse affect on the source's
> performance; the source, it must be noted, has other things to do besides
> just feeding my system data.
> So my solution is to limit the number of external connection to 1, and use
> ZooKeeper leader-elections to manage which machine is running at which time.
> This way, we keep the number of external connections down as low as we go,
> we can guarantee that messages are received and processed idempotently, and
> in the normal situation where there is no trouble at all, life is fine.
> What I am trying to deal with right now is how to manage the corner cases of
> when communication with ZooKeeper breaks down.
> To answer your question about the ZooKeeper cluster installation: no, it is
> not located in multiple data centers. It is, however, co-located with other
> processes. For about 90-95% (we have an actual measurement, but I can't
> remember it off the top of my head) of the time, the resource utilization is
> low enough and ZooKeeper is lightweight enough that it makes sense to
> co-locate. Occasionally, however, we do see a spike in an individual
> machine's utilization. Even more occasionally, that spike can result in
> clients being disconnected from that ZooKeeper node. Since almost all the
> remainder of the cluster is reachable and appropriately utilized, clients
> typically reconnect to another node, and all is well. Of course, with this