Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - producer behavior when network is down


Copy link to this message
-
Re: producer behavior when network is down
Felix GV 2013-08-12, 21:49
Async production is meant to work this way. You have no delivery guarantee
nor any exception because the producer sends the message independently of
the code that called the aync production function.

It is meant to be faster than sync production, but it is obviously intended
for non-critical messages.

--
Felix
On Fri, Jul 26, 2013 at 12:27 PM, Viktor Kolodrevskiy <
[EMAIL PROTECTED]> wrote:

> Hey guys,
>
> We decided to use Kafka in our new project, now I spend some time to
> research how Kafka producer behaves while network connectivity
> problems.
>
> I had 3 virtual machines(ubuntu 13.04, running on Virtualbox) in one
> network:
>
> 1. Kafka server(0.7.2) + Zookeper.
> 2. Producer app with default settings.
> 3. Consumer app.
>
> Results of the following tests with default sync producer settings:
>
> 1. Condition: Put network down on machine (1) for 20 mins.
> Result: Producer is working for ~16mins. Consumer does not receive
> anything.
> After ~16mins Producer app fails(with java.io.IOException: Connection
> timed out). Consumer app does not fail.
> Messages that were generated during 16mins are lost!
>
> 2. Condition: Put network down on machine (1) for 5 mins and after 5
> mins start network on (1) again.
> Result: Producer app is working, no exceptions or notification that
> network was down.
> Consumer does not receive messages for 5 mins. But when network on (1)
> is up it receives all messages.
> There are no messages lost.
>
> 3. Condition: put network down on machine (2) for 20 mins.
> Result: Producer is working for ~16mins. Consumer does not receive
> anything.
> After ~16mins Producer app fails(with java.io.IOException: Connection
> timed out). Consumer app does not fail.
> Messages that were generated during 16mins are lost! (Same result as in
> test#1)
> Kafka and Zookeeper logs that producer is disconnected.
>
> 4. Condition: Put network down on machine (2) for 5 mins and after 5
> mins start network on (2) again.
> Result: Producer app is working, no exceptions or notification that
> network was down.
> Consumer does not receive messages for 5 mins. But when network on (2)
> is up it receives all messages.(Same result as in test#2)
> Kafka and Zookeeper logs that producer is disconnected.
>
> 5. Condition: Kill Kafka server(0.7.2) + Zookeper(kill application, do
> not shutdown network).
> Result: Producer fails in a few seconds with
> "kafka.common.NoBrokersForPartitionException: Partition = null"
> Consumer is still working even after 25 minutes.
>
> One more interesting thing. Changing connect.timeout.ms parameter
> value for producer
> did not change 16 mins that I have.
>
> Played with settings and find out the only way to reduce time for
> producer to find out that network is down is to change one of two
> parameters: reconnect.interval, reconnect.time.interval.ms
>
> So lets say we change reconnect.time.interval.ms=1000.
> This means that producer will do reconnect to kafka every 1 second.
> In this case producer find out that network is down in 1 second.
> Producer stops sending messages and throw "java.net.ConnectException:
> Connection timed out". This is the only way that I found out so far.
> In this case we do not loose too much messages but performance may suffer.
> Or we can set reconnect.interval=1 so reconnect will happen after each
> message sent
> and do not loose messages at all.
>
> Then I did testing for Async producer(producer.type=async)
> The results are dramatic for me, coz producer does not throw any exception.
> It sends messages and does not fall.
> I left it running for night and it did not fall though network between
> kafka server and producer app was down.
> Playing with async producer config parameters did not help also.
>
> My questions are:
>
> 1. Where may these 16 mins come from?
> 2. Are there any best practices to handle network down issues?
> 3. Why async producer never throws exceptions when network is down?
> 4. What is the way to check from sync/async producer that messages