Bob Jervis 2013-03-20, 18:06
-Re: Socket timeouts fetching metadata
Neha Narkhede 2013-03-21, 05:20
Socket timeouts while reading the producer response could indicate a
bottleneck on the server request handling. It could be in the network
layer, or i/o performance or config issue. It will help if you create a
JIRA and attach some part of your producer log that includes the timeout
errors. Also will be helpful to see the request log and server log of the
Kafka brokers for the time period of the timeouts.
On Wednesday, March 20, 2013, Bob Jervis wrote:
> We are seeing some odd socket timeouts from one of our producers. This
> producer fans out data from one topic into dozens or hundreds of potential
> output topics. We batch the send's to write 1,000 messages at a time.
> The odd thing is that the timeouts are happening in the socket read, so I
> assume that the socket.timeout.ms value applies, which we leave as the
> default of 30 seconds. The odd thing is that these exceptions are getting
> thrown in clusters of 3-5 at a time with just a few seconds or less in
> between each. We are running with 64 network threads in our brokers, which
> seems plenty given that the broker has only 8 cores. From the clustering
> of timeouts, it looks perhaps like we are issuing multiple metadata
> requests in parallel. Is that true?
> We haven't touched the io threads (still set at 2), but I'm wondering if
> these are just artifacts of congestion in the communication between the
> brokers and our clients. Are we using too many distinct topics (~95) and
> should we try to cut down on them as a way to smooth the message exchanges
> between broker and client? I think that we are expecting the number of
> topics in production to be much higher than these values.
> It does appear that the producer in this case is able to continue sending,
> but these exceptions in the logs make our testers unhappy.
> I won't include the very lengthy log messages in toto, but the stack traces
> look like:
> at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86)
> at kafka.utils.Utils$.read(Utils.scala:372)
> at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
> at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73)
> at kafka.producer.SyncProducer.send(SyncProducer.scala:105)
> at kafka.utils.Utils$.swallow(Utils.scala:164)
> at kafka.utils.Logging$class.swallowError(Logging.scala:105)
> at kafka.utils.Utils$.swallowError(Utils.scala:43)
> at kafka.producer.Producer.send(Producer.scala:76)
> at kafka.javaapi.producer.Producer.send(Producer.scala:41)