Matthew Rathbone 2013-04-13, 17:31
Jay Kreps 2013-04-13, 18:28
Jay Kreps 2013-04-13, 18:32
-Re: We have an invalid message in a segment, how to deal with it?
Matthew Rathbone 2013-04-13, 18:51
Ok, so yeah it doesn't sound like that is what happened. on this partition.
FWIW all other 8 partitions were fine.
What I ended up doing was deleting partition N + all partitions > N (which
was about ~1 hour of data, given we just had a 8 hour outage this seemed
I can take a copy of the logs and try to search through them to find out
what happened. For future reference is there a way to force the kafka
server to perform recovery on all the segments?
If there had been a main() I could have run on the file to perform recovery
manually that would have been superb, not sure how easy that would be to
On Sat, Apr 13, 2013 at 1:32 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> I think the error you are seeing is due to the gap in the log. In 0.7
> we validate that the log is contiguous and by deleting a segment you
> are missing a chunk which would lead to problems in fetches for
> offsets in that range. You can always safely delete a prefix of the
> log (i.e. that segment and everything before it). You could also
> rename the files to be contiguous if you had a lot of patience (i.e.
> the nth file needs to have a name corresponding to the n-1st file
> +length(n-1st file) if that makes sense...
> I guess the forensics question is how we ended up rolling the log with
> an invalid message, the broker should kill itself and then fix the log
> on recovery when that occurs.
> On Sat, Apr 13, 2013 at 11:27 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> > You should be able to just bounce the broker. Our default policy is
> > that if we run out of space we shut down the broker automatically as
> > in that case there is no guarantee on what has been written to disk.
> > On startup if a clean shutdown hasn't been performed the broker should
> > run a recovery procedure on the log that includes checksumming all
> > messages. Invalid messages will be removed. Sounds like this isn't
> > what happened?
> > -Jay
> > On Sat, Apr 13, 2013 at 10:30 AM, Matthew Rathbone
> > <[EMAIL PROTECTED]> wrote:
> >> Hey guys,
> >> Due to a disk filling up, one of the segments has an invalid message in
> >> I have verified this using DumpLogSegments.
> >> How do I deal with this now? the invalid message is causing our Hadoop
> >> Consumer to fail.
> >> Is there a way to remove the invalid message from the segment? Removing
> >> whole segment causes the broker to fail on startup with an error:
> >> java.lang.IllegalStateException: The following segments don't validate:
> >> <bad-file>, <bad-file+1>
> >> I'm happy losing that file if needs be, but I need to get this broker
> >> up asap.
> >> --
> >> Matthew Rathbone
> >> Foursquare | Software Engineer | Server Engineering Team
> >> [EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> |
> >> 4sq<http://foursquare.com/rathboma>
Foursquare | Software Engineer | Server Engineering Team
[EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> |