Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - log file corruption when reading older messages


Copy link to this message
-
Re: log file corruption when reading older messages
Philip O'Toole 2013-06-11, 03:18
We often replay data days old, and have never seen any issues like
this. We are running 0.72.

Philip

On Mon, Jun 10, 2013 at 11:17 AM, Todd Bilsborrow
<[EMAIL PROTECTED]> wrote:
> We've been running Kafka 0.7.0 in production for several months and have been quite happy. Our use case to date has been to pull from the head of our topics, so we're normally consuming within seconds of message production using the high level consumer which is working great as far as I can tell.
>
> Recently I've started pulling older data (usually a few hours old) using the low level consumer, and I'm running into what appears to be corruption in the data files. The consumer pauses for several seconds and then throws "java.io.EOFException: Received -1 when reading from channel, socket has likely been closed". The server log shows "ERROR Closing socket for /xx.xx.xx.xx because of error (kafka.network.Processor) java.io.IOException: Input/output error". And DumpLogSegments reads up to the problematic offset and then stops and reports that the tail of the log is at offset: <bad offset> even though there is more data in the file (the next segment file's starting offset is much higher).
>
> I learned here on the mailing list last month that I can skip the rest of the corrupted segment, but I'd rather not be doing that because then I'm losing messages. This has happened 5-6 times in the past month; I've seen it on different brokers, different topics, different partitions, and different segments.
>
> So finally, my questions are:
>
> - is anyone else pulling older data without issues, or is everyone pretty much always consuming as fast as possible?
> - is there a known bug that would be fixed with an upgrade to a newer kafka version? I don't know if it's the same problem but I see jira 309 and 310 are marked as fixed but I don't know in which version
> - is there any way to examine a corrupt file to see what went wrong? Or any way to diagnose why it's happening?
>
> Thanks!