Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # user - message loss


+
Scott Clasen 2013-08-22, 19:50
+
Neha Narkhede 2013-08-22, 20:59
Copy link to this message
-
Re: message loss
Scott Clasen 2013-08-22, 21:11
+1 for that knob on a per topic basis, choosing consistency over availability would open kafka to more use cases no?

Sent from my iPhone

On Aug 22, 2013, at 1:59 PM, Neha Narkhede <[EMAIL PROTECTED]> wrote:

> Scott,
>
> Kafka replication aims to guarantee that committed writes are not lost. In
> other words, as long as leader can be transitioned to a broker that was in
> the ISR, no data will be lost. For increased availability, if there are no
> other brokers in the ISR, we fall back to electing a broker that is not
> caught up with the current leader, as the new leader. IMO, this is the real
> problem that the post is complaining about.
>
> Let me explain his test in more detail-
>
> 1. The first part of the test partitions the leader (n1) from other brokers
> (n2-n5). The leader shrinks the ISR to just itself and ends up taking n
> writes. This is not a problem all by itself. Once the partition is
> resolved, n2-n5 would catch up from the leader and no writes will be lost,
> since n1 would continue to serve as the leader.
> 2. The problem starts in the second part of the test where it partitions
> the leader (n1) from zookeeper. This causes the unclean leader election
> (mentioned above), which causes Kafka to lose data.
>
> We thought about this while designing replication, but never ended up
> including the feature that would allow some applications to pick
> consistency over availability. Basically, we could let applications pick
> some topics for which the controller will never attempt unclean leader
> election. The result is that Kafka would reject writes and mark the
> partition offline, instead of moving leadership to a broker that is not in
> ISR, and losing the writes.
>
> I think if we included this knob, the tests that aphyr (jepsen) ran, would
> make more sense.
>
> Thanks,
> Neha
>
>
> On Thu, Aug 22, 2013 at 12:50 PM, Scott Clasen <[EMAIL PROTECTED]> wrote:
>
>> So looks like there is a jespen post coming on kafka 0.8 replication, based
>> on this thats circulating on twitter. https://www.refheap.com/17932/raw
>>
>> Understanding that kafka isnt designed particularly to be partition
>> tolerant, the result is not completely surprising.
>>
>> But my question is, is there something that can be done about the lost
>> messages?
>>
>> From my understanding when broker n1 comes back on line, currently what
>> will happen is that the messages that were only on n1 will be
>> truncated/tossed while n1 is coming back to ISR. Please correct me if this
>> is not accurate.
>>
>> Would it instead be possible to do something else with them, like sending
>> them to an internal lost messages topic, or log file where some manual
>> intervenion could be done on them, or a configuration property like
>> replay.truncated.messages=true could be set where the broker would send the
>> lost messages back onto the topic after ISR?
>>

 
+
Neha Narkhede 2013-08-22, 21:17