Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # dev >> [jira] [Commented] (KAFKA-1106) HighwaterMarkCheckpoint failure puting broker into a bad state


Copy link to this message
-
[jira] [Commented] (KAFKA-1106) HighwaterMarkCheckpoint failure puting broker into a bad state

    [ https://issues.apache.org/jira/browse/KAFKA-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808741#comment-13808741 ]

David Lao commented on KAFKA-1106:
----------------------------------

No there is no chance of manual intervention. However the broker node in question appeared to have gone through fail fast like exit and recovery a few hours prior but it was working fine until hitting this bug. Could a corrupted file have led to this? If so is failing fast the way to handle the situation?

> HighwaterMarkCheckpoint failure puting broker into a bad state
> --------------------------------------------------------------
>
>                 Key: KAFKA-1106
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1106
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: David Lao
>         Attachments: KAFKA-1106-patch, kafka.log
>
>
> I'm encountering a case where broker get stuck due to HighwaterMarkCheckpoint failing to recover from reading what appear to be corrupted isr entries. Once in this state, leader election can never succeed and hence stalling the entire cluster.
> Please see the detailed stack trace from the attached log.  Perhaps failing fast when HighwaterMarkCheckpoint fails to read would force the broker to restart and recover.  

--
This message was sent by Atlassian JIRA
(v6.1#6144)