Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Data loss in case of request.required.acks set to -1


Copy link to this message
-
Re: Data loss in case of request.required.acks set to -1
Jason Rosenberg 2013-12-23, 20:27
Is it possible to expose programmatically, the number of brokers in ISR for
each partition?  We could make this a gating thing before shutting down a
broker gracefully, to make sure things are in good shape.....I guess
controlled shutdown assures this anyway, in a sense.....

Jason
On Mon, Dec 23, 2013 at 2:22 PM, Guozhang Wang <[EMAIL PROTECTED]> wrote:

> Hanish,
>
> Originally when you create the two partitions their leadership should be
> evenly distributed to two brokers, i.e. one broker get one partition.
> But from your case broker 1 is the leader for both partition 1 and 0, and
> from the replica list broker 0 should be originally the leader for
> partition1 since the leader of a partition should be the first one in the
> replica list.
>
> This means broker 0 was bounced or halted (e.g. by a GC, etc) before, and
> hence the leadership of partition 1 migrates to broker 1, and also it is
> still catching up after the bounce since it is not in isr for any
> partitions yet. In this case, when you bounce broker 1, broker 0 which is
> not in ISR will be selected as the new leader for both and hence cause data
> loss.
>
> If you are doing experiments on rolling bounce of say N replication factor,
> one thing to do is wait for the isr to have at least 2 brokers before
> bouncing the next one, otherwise data loss will not be guaranteed even if
> number of replicas is larger than 2.
>
> If you want to read more I would recommend this blog about Kafka's
> guarantee:
>
> http://blog.empathybox.com/post/62279088548/a-few-notes-on-kafka-and-jepsen
>
> Guozhang
>
>
>
>
> On Sun, Dec 22, 2013 at 10:38 PM, Hanish Bansal <
> [EMAIL PROTECTED]> wrote:
>
> > Hi Guazhang,
> >
> > When both nodes are alive then topic isr status is:
> >
> > topic: test-trunk111    partition: 0    leader: 0    replicas: 1,0
>  isr:
> > 0
> > topic: test-trunk111    partition: 1    leader: 0    replicas: 0,1
>  isr:
> > 0
> >
> > Now as the leader node is broker-0 so when i am producing the data then
> > meanwhile kill the leader node.
> > After leader goes down, topic isr status is:
> >
> > topic: test-trunk111    partition: 0    leader: 1    replicas: 1,0
>  isr:
> > 1
> > topic: test-trunk111    partition: 1    leader: 1    replicas: 0,1
>  isr:
> > 1
> >
> > Now after all data produced when i consumed the data, there is some data
> > loss.
> >
> > *Also in controller logs there is entry like:*
> >
> > [2013-12-23 10:25:07,648] DEBUG [OfflinePartitionLeaderSelector]: No
> broker
> > in ISR is alive for [test-trunk111,1]. Pick the leader from the alive
> > assigned replicas: 1 (kafka.controller.OfflinePartitionLeaderSelector)
> > [2013-12-23 10:25:07,648] WARN [OfflinePartitionLeaderSelector]: No
> broker
> > in ISR is alive for [test-trunk111,1]. Elect leader 1 from live brokers
> 1.
> > There's potential data loss.
> > (kafka.controller.OfflinePartitionLeaderSelector)
> > [2013-12-23 10:25:07,649] INFO [OfflinePartitionLeaderSelector]: Selected
> > new leader and ISR {"leader":1,"leader_epoch":1,"isr":[1]} for offline
> > partition [test-trunk111,1]
> > (kafka.controller.OfflinePartitionLeaderSelector)
> >
> > Is there any solution for this behaviour ?
> >
> >
> > On Fri, Dec 20, 2013 at 7:27 PM, Guozhang Wang <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hanish,
> > >
> > > One thing you can check is when you kill one of the brokers, is the
> other
> > > broker on the ISR last of the partition that killed broker is hosting.
> > This
> > > can be done using the kafka-topics tool.
> > >
> > > Also you can check if the controller log if there is any entry like "No
> > > broker in ISR is alive for %s. Elect leader %d from live brokers %s.
> > > There's potential data loss."
> > >
> > > Guozhang
> > >
> > >
> > > On Fri, Dec 20, 2013 at 9:11 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > >
> > > > Could you reproduce this easily? If so, could you file a jira and
> > > describe
> > > > the steps?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >