I think the error you are seeing is due to the gap in the log. In 0.7
we validate that the log is contiguous and by deleting a segment you
are missing a chunk which would lead to problems in fetches for
offsets in that range. You can always safely delete a prefix of the
log (i.e. that segment and everything before it). You could also
rename the files to be contiguous if you had a lot of patience (i.e.
the nth file needs to have a name corresponding to the n-1st file
+length(n-1st file) if that makes sense...
I guess the forensics question is how we ended up rolling the log with
an invalid message, the broker should kill itself and then fix the log
on recovery when that occurs.
On Sat, Apr 13, 2013 at 11:27 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> You should be able to just bounce the broker. Our default policy is
> that if we run out of space we shut down the broker automatically as
> in that case there is no guarantee on what has been written to disk.
> On startup if a clean shutdown hasn't been performed the broker should
> run a recovery procedure on the log that includes checksumming all
> messages. Invalid messages will be removed. Sounds like this isn't
> what happened?
> On Sat, Apr 13, 2013 at 10:30 AM, Matthew Rathbone
> <[EMAIL PROTECTED]> wrote:
>> Hey guys,
>> Due to a disk filling up, one of the segments has an invalid message in it.
>> I have verified this using DumpLogSegments.
>> How do I deal with this now? the invalid message is causing our Hadoop
>> Consumer to fail.
>> Is there a way to remove the invalid message from the segment? Removing the
>> whole segment causes the broker to fail on startup with an error:
>> java.lang.IllegalStateException: The following segments don't validate:
>> <bad-file>, <bad-file+1>
>> I'm happy losing that file if needs be, but I need to get this broker back
>> up asap.
>> Matthew Rathbone
>> Foursquare | Software Engineer | Server Engineering Team
>> [EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> |