Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Data loss in case of request.required.acks set to -1


Copy link to this message
-
Re: Data loss in case of request.required.acks set to -1
Hanish,

In this case I believe it is a bug for kill -9 scenario. Could you file a
jira and describe the process to reproduce?

Guozhang
On Mon, Dec 23, 2013 at 7:42 PM, Hanish Bansal <
[EMAIL PROTECTED]> wrote:

> Sorry last message was sent by mistake.
>
> Hi Guazhang,
>
> Please find my comments below :
>
>
> On Tue, Dec 24, 2013 at 12:52 AM, Guozhang Wang <[EMAIL PROTECTED]>
> wrote:
>
> > Hanish,
> >
> > Originally when you create the two partitions their leadership should be
> > evenly distributed to two brokers, i.e. one broker get one partition.
> > But from your case broker 1 is the leader for both partition 1 and 0, and
> > from the replica list broker 0 should be originally the leader for
> > partition1 since the leader of a partition should be the first one in the
> > replica list.
> >
>
>
> *When i am creating the topic in that case then their leadership is evenly
> distributed to two brokers as you said. And yes one important thing is that
> when their leadership is evenly distributed to two brokers(lets say
> broker-0 is leader of partition 1 and broker-1 is leader of partition 0)
> the there is NO DATA LOSS. But my scenario is occurring if i restart any
> one node after topic created,Because there is only one live broker for
> sometime so that live broker becomes leader for both nodes.*
>
> > This means broker 0 was bounced or halted (e.g. by a GC, etc) before, and
> > hence the leadership of partition 1 migrates to broker 1, and also it is
> > still catching up after the bounce since it is not in isr for any
> > partitions yet. In this case, when you bounce broker 1, broker 0 which is
> > not in ISR will be selected as the new leader for both and hence cause
> data
> > loss.
> >
> > If you are doing experiments on rolling bounce of say N replication
> factor,
> > one thing to do is wait for the isr to have at least 2 brokers before
> > bouncing the next one, otherwise data loss will not be guaranteed even if
> > number of replicas is larger than 2.
> >
> >
>
>
>
> *Yes, i have tried that after broker-0 was restarted wait for sometime so
> that it comes into isr list. Checked the isr status which is:topic:
> test-trunk111    partition: 0    leader: 1    replicas: 1,0    isr: 0,1
> topic: test-trunk111    partition: 1    leader: 1    replicas: 0,1    isr:
> 0,1*
> *Now start producing the data and kill the broker 1 and observed the
> behavior. There is still data loss. In this case both brokers are in isr
> list. I also experienced a little different behavior in this case that is
> there is less data loss in comparison to other case where only one broker
> is in isr list. In first case where only one broker is in isr list i
> experienced 50-60 % data loss where is this case where both 2 brokers are
> in isr list i experienced only 2-3 % data loss.*
>
>
> > If you want to read more I would recommend this blog about Kafka's
> > guarantee:
> >
> >
> http://blog.empathybox.com/post/62279088548/a-few-notes-on-kafka-and-jepsen
> >
> > Guozhang
> >
> >
> >
> >
> > On Sun, Dec 22, 2013 at 10:38 PM, Hanish Bansal <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Hi Guazhang,
> > >
> > > When both nodes are alive then topic isr status is:
> > >
> > > topic: test-trunk111    partition: 0    leader: 0    replicas: 1,0
> >  isr:
> > > 0
> > > topic: test-trunk111    partition: 1    leader: 0    replicas: 0,1
> >  isr:
> > > 0
> > >
> > > Now as the leader node is broker-0 so when i am producing the data then
> > > meanwhile kill the leader node.
> > > After leader goes down, topic isr status is:
> > >
> > > topic: test-trunk111    partition: 0    leader: 1    replicas: 1,0
> >  isr:
> > > 1
> > > topic: test-trunk111    partition: 1    leader: 1    replicas: 0,1
> >  isr:
> > > 1
> > >
> > > Now after all data produced when i consumed the data, there is some
> data
> > > loss.
> > >
> > > *Also in controller logs there is entry like:*
> > >
> > > [2013-12-23 10:25:07,648] DEBUG [OfflinePartitionLeaderSelector]: No
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB