Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # dev >> [jira] [Updated] (KAFKA-1112) broker can not start itself after kafka is killed with -9


Copy link to this message
-
[jira] [Updated] (KAFKA-1112) broker can not start itself after kafka is killed with -9

     [ https://issues.apache.org/jira/browse/KAFKA-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jay Kreps updated KAFKA-1112:
-----------------------------

    Attachment: KAFKA-1112-v1.patch

The way the check was supposed to work was this: if the last offset in the file is the recoveryPoint-1 then skip the recovery (because the whole file is flushed). The way this was implemented was by using the last entry in the index to find the final message.

Overall I feel this is a bit of a hack, but we wanted to separate out the "fsync is async" feature from a full incremental recovery implementation that only recovers unflushed data.

The immediate problem was that we broke the short circuit by adding code to try to handle a corner case: what if log is truncated after to a flush and hence the end of the log is < recovery point. This was just totally broken and we were short circuiting out of the check in virtually all cases including corrupt index.

This issue wasn't caught because there was a bug in the log corruption unit test that gave a false pass on all index corruptions. :-(

The fix is the following:
1. Fix the logical bug
2. Add LogSegment.needsRecovery() which is a more paranoid version of what we were doing before that attempts to be safe regardless of any index or log corruption that may have occurred. Having this method here is a little hacky but probably okay until we get a full incremental recovery impl.
3. Fix the unit test that covers this.
> broker can not start itself after kafka is killed with -9
> ---------------------------------------------------------
>
>                 Key: KAFKA-1112
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1112
>             Project: Kafka
>          Issue Type: Bug
>          Components: log
>    Affects Versions: 0.8, 0.8.1
>            Reporter: Kane Kim
>            Assignee: Jay Kreps
>            Priority: Critical
>         Attachments: KAFKA-1112-v1.patch, KAFKA-1112.out
>
>
> When I kill kafka with -9, broker cannot start itself because of corrupted index logs. I think kafka should try to delete/rebuild indexes itself without manual intervention.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

 
+
Neha Narkhede 2013-11-01, 16:23
+
Neha Narkhede 2013-11-01, 17:49
+
Guozhang Wang 2013-11-05, 18:59
+
Jay Kreps 2013-11-14, 20:59
+
Jun Rao 2013-11-18, 22:11