Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # dev >> [jira] [Updated] (KAFKA-1108) when controlled shutdown attempt fails, the reason is not always logged


Copy link to this message
-
[jira] [Updated] (KAFKA-1108) when controlled shutdown attempt fails, the reason is not always logged

     [ https://issues.apache.org/jira/browse/KAFKA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neha Narkhede updated KAFKA-1108:
---------------------------------

    Fix Version/s: 0.8.1

> when controlled shutdown attempt fails, the reason is not always logged
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-1108
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1108
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Rosenberg
>             Fix For: 0.8.1
>
>
> In KafkaServer.controlledShutdown(), it initiates a controlled shutdown, and then if there's a failure, it will retry the controlledShutdown.
> Looking at the code, there are 2 ways a retry could fail, one with an error response from the controller, and this messaging code:
> {code}
> info("Remaining partitions to move: %s".format(shutdownResponse.partitionsRemaining.mkString(",")))
> info("Error code from controller: %d".format(shutdownResponse.errorCode))
> {code}
> Alternatively, there could be an IOException, with this code executed:
> {code}
>             catch {
>               case ioe: java.io.IOException =>
>                 channel.disconnect()
>                 channel = null
>                 // ignore and try again
>             }
> {code}
> And then finally, in either case:
> {code}
>           if (!shutdownSuceeded) {
>             Thread.sleep(config.controlledShutdownRetryBackoffMs)
>             warn("Retrying controlled shutdown after the previous attempt failed...")
>           }
> {code}
> It would be nice if the nature of the IOException were logged in either case (I'd be happy with an ioe.getMessage() instead of a full stack trace, as kafka in general tends to be too willing to dump IOException stack traces!).
> I suspect, in my case, the actual IOException is a socket timeout (as the time between initial "Starting controlled shutdown...." and the first "Retrying..." message is usually about 35 seconds (the socket timeout + the controlled shutdown retry backoff).  So, it would seem that really, the issue in this case is that controlled shutdown is taking too long.  It would seem sensible instead to have the controller report back to the server (before the socket timeout) that more time is needed, etc.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB