Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Trouble recovering after a crashed broker


Copy link to this message
-
Re: Trouble recovering after a crashed broker
How many replicas do you have on that topic? What's the output of list
topic?

Thanks,

Jun
On Mon, Jan 6, 2014 at 1:45 AM, Vincent Rischmann <[EMAIL PROTECTED]>wrote:

> Hi,
>
> yes, I'm seeing the errors on the crashed broker.
>
> My controller.log file only contains the following:
>
> [2014-01-03 09:41:01,794] INFO [ControllerEpochListener on 1]: Initialized
> controller epoch to 11 and zk version 10
> (kafka.controller.ControllerEpochListener)
> [2014-01-03 09:41:01,812] INFO [Controller 1]: Controller starting up
> (kafka.controller.KafkaController)
> [2014-01-03 09:41:02,082] INFO [Controller 1]: Controller startup complete
> (kafka.controller.KafkaController)
>
> Since friday, nothing has changed and the broker generated multiples
> gigabytes of traces in server.log, one of the last exception looks like
> this:
>
> Request for offset 787449 but we only have log segments in the range 0 to
> 163110.
>
> The range has increased since friday (it was "0 to 19372"), does this mean
> the broker is actually catching up ?
>
>
> Thanks for your help.
>
>
>
>
> 2014/1/3 Jun Rao <[EMAIL PROTECTED]>
>
> > If a broker crashes and restarts, it will catch up the missing data from
> > the leader replicas. Normally, when this broker is catching up, it won't
> be
> > serving any client requests though. Are you seeing those errors on the
> > crashed broker? Also, you are not supposed to see
> OffsetOutOfRangeException
> > with just one broker failure with 3 replicas. Do you see the following in
> > the controller log?
> >
> > "No broker in ISR is alive for ... There's potential data loss."
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Jan 3, 2014 at 1:23 AM, Vincent Rischmann <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi all,
> > >
> > > We have a cluster of 3 0.8 brokers, and this morning one of the broker
> > > crashed.
> > > It is a test broker, and we stored the logs in /tmp/kafka-logs. All
> > topics
> > > in use are replicated on the three brokers.
> > >
> > > You can guess the problem, when the broker rebooted it wiped all the
> data
> > > in the logs.
> > >
> > > The producers and consumers are fine, but the broker with the wiped
> data
> > > keeps generating a lot of exceptions, and I don't really know what to
> do
> > to
> > > recover.
> > >
> > > Example exception:
> > >
> > > [2014-01-03 10:09:47,755] ERROR [KafkaApi-1] Error when processing
> fetch
> > > request for partition [topic,0] offset 814798 from consumer with
> > > correlation id 0 (kafka.server.KafkaApis)
> > > kafka.common.OffsetOutOfRangeException: Request for offset 814798 but
> we
> > > only have log segments in the range 0 to 19372.
> > >
> > > There are a lot of them, something like 10+ per second. I (maybe
> wrongly)
> > > assumed that the broker would catch up, if that's the case how can I
> see
> > > the progress ?
> > >
> > > In general, what is the recommended way to bring back a broker with
> wiped
> > > data in a cluster ?
> > >
> > > Thanks.
> > >
> >
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB