graham sanderson 2012-07-13, 02:08
Hi, so I happened to be going to demo a prototype built with kafka in a borrowed large room which I discovered had insufficient/flaky wireless. Was using zookeeper config, and getting lots of timeouts etc. Since this was the first time I had used kafka and I hadn't done any off path testing, my first course of action was to find a hard wire, which I did and all the timeouts disappeared. The demo was great. Note that even with the flaky wireless, messages generally still seemed to be getting delivered, but not always as far as I could tell (or perhaps with high latency - was more focused on having a working demo than debugging)
I'm using 0.7 atm, though I'm not sure if that matters.
My somewhat question is, given a simple scenario using kafka/zookeeper (prior to all the exciting fault tolerance work going on right now):
1) Lets say I have zookeeper server, kafka server, producer, and consumer running on a perfect network. And I successfully send a message from producer to consumer
2) All JVMs stay up, however I lose network connectivity between some or all of them for some time
3) The network becomes perfect again.
4) I wait for some time for everyone to reconnect/re-negociate to their best ability
Following that, should I expect a new message from the producer to reach the consumer, or can the system get into a broken state?… I swear I saw such a message not delivered, but I can't say for sure… I can certainly investigate further by trying to reproduce again and wading thru the many logged errors, but if someone already knows the answer that'd be awesome!