Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # dev >> [jira] [Updated] (KAFKA-1112) broker can not start itself after kafka is killed with -9


Copy link to this message
-
[jira] [Updated] (KAFKA-1112) broker can not start itself after kafka is killed with -9

     [ https://issues.apache.org/jira/browse/KAFKA-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jay Kreps updated KAFKA-1112:
-----------------------------

    Attachment: KAFKA-1112-v1.patch

The way the check was supposed to work was this: if the last offset in the file is the recoveryPoint-1 then skip the recovery (because the whole file is flushed). The way this was implemented was by using the last entry in the index to find the final message.

Overall I feel this is a bit of a hack, but we wanted to separate out the "fsync is async" feature from a full incremental recovery implementation that only recovers unflushed data.

The immediate problem was that we broke the short circuit by adding code to try to handle a corner case: what if log is truncated after to a flush and hence the end of the log is < recovery point. This was just totally broken and we were short circuiting out of the check in virtually all cases including corrupt index.

This issue wasn't caught because there was a bug in the log corruption unit test that gave a false pass on all index corruptions. :-(

The fix is the following:
1. Fix the logical bug
2. Add LogSegment.needsRecovery() which is a more paranoid version of what we were doing before that attempts to be safe regardless of any index or log corruption that may have occurred. Having this method here is a little hacky but probably okay until we get a full incremental recovery impl.
3. Fix the unit test that covers this.
> broker can not start itself after kafka is killed with -9
> ---------------------------------------------------------
>
>                 Key: KAFKA-1112
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1112
>             Project: Kafka
>          Issue Type: Bug
>          Components: log
>    Affects Versions: 0.8, 0.8.1
>            Reporter: Kane Kim
>            Assignee: Jay Kreps
>            Priority: Critical
>         Attachments: KAFKA-1112-v1.patch, KAFKA-1112.out
>
>
> When I kill kafka with -9, broker cannot start itself because of corrupted index logs. I think kafka should try to delete/rebuild indexes itself without manual intervention.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

 
+
Neha Narkhede 2013-11-01, 16:23
+
Neha Narkhede 2013-11-01, 17:49
+
Guozhang Wang 2013-11-05, 18:59
+
Jay Kreps 2013-11-14, 20:59
+
Jun Rao 2013-11-18, 22:11
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB