Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Replicas for partition are dead


Copy link to this message
-
Re: Replicas for partition are dead
Jun Rao 2013-03-22, 15:07
The easiest way is to wipe out both ZK and kafka data and start from
scratch.

Thanks,

Jun

On Fri, Mar 22, 2013 at 6:51 AM, Jason Huang <[EMAIL PROTECTED]> wrote:

> Thanks Jun.
>
> I have built the new kafka version and start the services. You
> mentioned that ZK data structure has been changed - does that mean we
> can't reload the previous messages from current log files? I actually
> tried to copy the log files (.logs and .index) to the new kafka
> instance but get the same "topic doesn't exist" error after running
> the new kafka services.
>
> Any comments on how I might be able to recover previous messages?
>
> thanks
>
> Jason
>
> On Wed, Mar 20, 2013 at 12:15 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > The latest version of 0.8 can be found in the 0.8 branch, not trunk.
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, Mar 20, 2013 at 7:47 AM, Jason Huang <[EMAIL PROTECTED]>
> wrote:
> >
> >> The 0.8 version I use was built from trunk last Dec. Since then, this
> >> error happened 3 times. Each time we had to remove all the ZK and
> >> Kafka log data and restart the services.
> >>
> >> I will try newer versions with more recent patches and keep monitoring
> it.
> >>
> >> thanks!
> >>
> >> Jason
> >>
> >> On Wed, Mar 20, 2013 at 10:39 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >> > Ok, so you are using the same broker id. What the error is saying is
> that
> >> > broker 1 doesn't seem to be up.
> >> >
> >> > Not sure what revision of 0.8 you are using. Could you try the latest
> >> > revision in 0.8 and see if the problem still exists? You may have to
> wipe
> >> > out all ZK and Kafka data first since some ZK data structures have
> been
> >> > rename a few weeks ago.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > On Wed, Mar 20, 2013 at 6:57 AM, Jason Huang <[EMAIL PROTECTED]>
> >> wrote:
> >> >
> >> >> I restarted the zookeeper server first, then broker. It's the same
> >> >> instance of kafka 0.8 and I am using the same config file. In
> >> >> server.properties I have: brokerid=1
> >> >>
> >> >> Is that sufficient to ensure the broker get restarted with the same
> >> >> broker id as before?
> >> >>
> >> >> thanks,
> >> >>
> >> >> Jason
> >> >>
> >> >> On Wed, Mar 20, 2013 at 12:30 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >> >> > Did the broker get restarted with the same broker id?
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > Jun
> >> >> >
> >> >> > On Tue, Mar 19, 2013 at 1:34 PM, Jason Huang <
> [EMAIL PROTECTED]>
> >> >> wrote:
> >> >> >
> >> >> >> Hello,
> >> >> >>
> >> >> >> My kafka (0.8) server went down today for unknown reason and when
> I
> >> >> >> restarted both zookeeper and kafka server I got the following
> error
> >> at
> >> >> >> the kafka server log:
> >> >> >>
> >> >> >> [2013-03-19 13:39:16,131] INFO [Partition state machine on
> Controller
> >> >> >> 1]: Invoking state change to OnlinePartition for partitions
> >> >> >> (kafka.controller.PartitionStateMachine)
> >> >> >> [2013-03-19 13:39:16,262] INFO [Partition state machine on
> Controller
> >> >> >> 1]: Electing leader for partition
> >> >> >> [topic_a937ac27-1883-4ca0-95bc-c9a740d08947, 0]
> >> >> >> (kafka.controller.PartitionStateMachine)
> >> >> >> [2013-03-19 13:39:16,451] ERROR [Partition state machine on
> >> Controller
> >> >> >> 1]: State change for partition
> >> >> >> [topic_a937ac27-1883-4ca0-95bc-c9a740d08947, 0] from
> OfflinePartition
> >> >> >> to OnlinePartition failed (kafka.controller.PartitionStateMachine)
> >> >> >> kafka.common.PartitionOfflineException: All replicas for partition
> >> >> >> [topic_a937ac27-1883-4ca0-95bc-c9a740d08947, 0] are dead. Marking
> >> this
> >> >> >> partition offline
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:300)
> >> >> >> .....
> >> >> >> Caused by: kafka.common.PartitionOfflineException: No replica for
> >> >> >> partition ([topic_a937ac27-1883-4ca0-95bc-c9a740d08947, 0]) is
> alive.