Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Trouble recovering after a crashed broker


Copy link to this message
-
Re: Trouble recovering after a crashed broker
Jun Rao 2014-01-06, 15:55
How many replicas do you have on that topic? What's the output of list
topic?

Thanks,

Jun
On Mon, Jan 6, 2014 at 1:45 AM, Vincent Rischmann <[EMAIL PROTECTED]>wrote:

> Hi,
>
> yes, I'm seeing the errors on the crashed broker.
>
> My controller.log file only contains the following:
>
> [2014-01-03 09:41:01,794] INFO [ControllerEpochListener on 1]: Initialized
> controller epoch to 11 and zk version 10
> (kafka.controller.ControllerEpochListener)
> [2014-01-03 09:41:01,812] INFO [Controller 1]: Controller starting up
> (kafka.controller.KafkaController)
> [2014-01-03 09:41:02,082] INFO [Controller 1]: Controller startup complete
> (kafka.controller.KafkaController)
>
> Since friday, nothing has changed and the broker generated multiples
> gigabytes of traces in server.log, one of the last exception looks like
> this:
>
> Request for offset 787449 but we only have log segments in the range 0 to
> 163110.
>
> The range has increased since friday (it was "0 to 19372"), does this mean
> the broker is actually catching up ?
>
>
> Thanks for your help.
>
>
>
>
> 2014/1/3 Jun Rao <[EMAIL PROTECTED]>
>
> > If a broker crashes and restarts, it will catch up the missing data from
> > the leader replicas. Normally, when this broker is catching up, it won't
> be
> > serving any client requests though. Are you seeing those errors on the
> > crashed broker? Also, you are not supposed to see
> OffsetOutOfRangeException
> > with just one broker failure with 3 replicas. Do you see the following in
> > the controller log?
> >
> > "No broker in ISR is alive for ... There's potential data loss."
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Jan 3, 2014 at 1:23 AM, Vincent Rischmann <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi all,
> > >
> > > We have a cluster of 3 0.8 brokers, and this morning one of the broker
> > > crashed.
> > > It is a test broker, and we stored the logs in /tmp/kafka-logs. All
> > topics
> > > in use are replicated on the three brokers.
> > >
> > > You can guess the problem, when the broker rebooted it wiped all the
> data
> > > in the logs.
> > >
> > > The producers and consumers are fine, but the broker with the wiped
> data
> > > keeps generating a lot of exceptions, and I don't really know what to
> do
> > to
> > > recover.
> > >
> > > Example exception:
> > >
> > > [2014-01-03 10:09:47,755] ERROR [KafkaApi-1] Error when processing
> fetch
> > > request for partition [topic,0] offset 814798 from consumer with
> > > correlation id 0 (kafka.server.KafkaApis)
> > > kafka.common.OffsetOutOfRangeException: Request for offset 814798 but
> we
> > > only have log segments in the range 0 to 19372.
> > >
> > > There are a lot of them, something like 10+ per second. I (maybe
> wrongly)
> > > assumed that the broker would catch up, if that's the case how can I
> see
> > > the progress ?
> > >
> > > In general, what is the recommended way to bring back a broker with
> wiped
> > > data in a cluster ?
> > >
> > > Thanks.
> > >
> >
>