-Re: Socket timeouts in 0.8
Bob Jervis 2013-03-22, 23:25
We've made some progress in our testing. While I do not have a good
explanation for all the better behavior today, we have been able to move a
substantial number of messages through the system today without any
exceptions (> 800K messages).
The big things between last night's mess and today was: 1. I moved the
Kafka log dir (the segment files) to a separate drive from the system
drive), and 2. I rudeced the number of network and io threads back down to
We also found a (probably) unrelated bug where we were getting the broker 0
and broker 1 host name mappings swapped (something about Zookeeper
returning children in any old order), so we weren't asking for topic
offsets from the correct broker. The code worked fine when there was only
one broker, but in a multi-broker cluster, we got bogus results.
Thanks for all the help,
On Fri, Mar 22, 2013 at 11:27 AM, Bob Jervis <[EMAIL PROTECTED]> wrote:
> I'm also seeing in the midst of the chaos (our app is generating 15GB of
> logs), the following event on one of our borkers:
> 2013-03-22 17:43:39,257 FATAL kafka.server.KafkaApis: [KafkaApi-1] Halting
> due to unrecoverable I/O error while handling produce request:
> kafka.common.KafkaStorageException: I/O exception in append to log
> at kafka.log.Log.append(Log.scala:218)
> at scala.collection.immutable.HashMap.map(HashMap.scala:35)
> at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:242)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:59)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.nio.channels.ClosedChannelException
> at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:88)
> at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:184)
> at kafka.log.FileMessageSet.append(FileMessageSet.scala:191)
> at kafka.log.LogSegment.append(LogSegment.scala:64)
> at kafka.log.Log.append(Log.scala:210)
> ... 14 more
> On Fri, Mar 22, 2013 at 11:00 AM, Bob Jervis <[EMAIL PROTECTED]> wrote:
>> I am getting the logs and I am trying to make sense of them. I see a
>> 'Received Request' log entry that appears to be what is coming in from our
>> app. I don't see any 'Completed Request' entries that correspond to those.
>> The only completed entries I see for the logs in question are from the
>> It is as if our app is asking the wrong broker and getting no answer, but
>> for some reason reporting it as a socket timeout.
>> Broker 0 is getting and completing TopicMetadata requests in about 600
>> milliseconds each.
>> Broker 1 is not reporting ANY TopicMetadatRequests in the TRACE logs.
>> Our app logs don't make any sense when I compare them to the broker logs
>> and how can we be getting timeouts in less than 1000 milliseconds?
>> Our app is reporting this:
>> 2013-03-22 17:42:23,047 WARN kafka.producer.async.DefaultEventHandler:
>> failed to send to broker 1 with data Map([v1-english-5,0] ->