|
|
-
A question about possible race condition
Guy Peleg 2012-10-31, 11:12
Hi,
As I learn and plan to use Kafka, I'm concirned about possible race condition when brokers/consumers are added or removed.
Say I have a topic that is devide into two partitions, where consumers are deviding the mssages between those two partitions by ,say, modulo event-id, where events with the same event ids should be processed by the order of their arrival, that will work since as I said, I will devide the incoming events by their event-id % number_of_partitions
Now, when a new paratition is added, there might be situations where events with event-id 'x', will still be in the first broker, while new ones, with event-id 'x', are added to the new paratition which may result in those events being processed in parallel, what am i missing?
Thanks,
Guy
-
Re: A question about possible race condition
Jun Rao 2012-10-31, 14:57
Guy,
This is really an issue with changing # of partitions. If # of partitions changes for a topic, in the transition phase, messages used to be delivered to the same partition could be delivered to different partitions and their consumption ordering is non-deterministic (since ordered consumption is only guaranteed within a partition).
In 0.7, # of partitions increases as new brokers are added. In 0.8, # of partitions is set at topic creation time and will stay the same when new brokers are added.
Thanks,
Jun
On Wed, Oct 31, 2012 at 4:12 AM, Guy Peleg <[EMAIL PROTECTED]> wrote:
> Hi, > > As I learn and plan to use Kafka, I'm concirned about possible race > condition when brokers/consumers are added or removed. > > Say I have a topic that is devide into two partitions, where consumers are > deviding the mssages between those two partitions by ,say, modulo event-id, > where events with the same event ids should be processed by the order of > their arrival, that will work since as I said, I will devide the incoming > events by their event-id % number_of_partitions > > Now, when a new paratition is added, there might be situations where events > with event-id 'x', will still be in the first broker, while new ones, with > event-id 'x', are added to the new paratition > which may result in those events being processed in parallel, what am i > missing? > > Thanks, > > Guy >
-
Re: A question about possible race condition
Guy Peleg 2012-10-31, 17:52
Thanks Jun,
That's a very important point, I guess we will have to go with 0.8
Guy
On Wed, Oct 31, 2012 at 4:57 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> Guy, > > This is really an issue with changing # of partitions. If # of partitions > changes for a topic, in the transition phase, messages used to be delivered > to the same partition could be delivered to different partitions and their > consumption ordering is non-deterministic (since ordered consumption is > only guaranteed within a partition). > > In 0.7, # of partitions increases as new brokers are added. In 0.8, # of > partitions is set at topic creation time and will stay the same when new > brokers are added. > > Thanks, > > Jun > > On Wed, Oct 31, 2012 at 4:12 AM, Guy Peleg <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > As I learn and plan to use Kafka, I'm concirned about possible race > > condition when brokers/consumers are added or removed. > > > > Say I have a topic that is devide into two partitions, where consumers > are > > deviding the mssages between those two partitions by ,say, modulo > event-id, > > where events with the same event ids should be processed by the order of > > their arrival, that will work since as I said, I will devide the incoming > > events by their event-id % number_of_partitions > > > > Now, when a new paratition is added, there might be situations where > events > > with event-id 'x', will still be in the first broker, while new ones, > with > > event-id 'x', are added to the new paratition > > which may result in those events being processed in parallel, what am i > > missing? > > > > Thanks, > > > > Guy > > >
-
Re: A question about possible race condition
Guy Peleg 2012-11-01, 07:10
One more possible race might happen when the partition number is fixed but consumer(s) are added/removed For example: If I have a consumer reading data from two partitions (partition one and partition two), and a new consumer is added, the result will be that each consumer will consume from one partition let's say that the 'old' consumer will continue with partition one while the new consumer will process the data from partition two
but, suppose that partition two held events that belong to event id 'x', and that partition is now consumed by the new consumer, Since consumers might reside on different machines and they are possibly multithreaded processes, there might be a situation that other event ids 'x' are already 'in the internal queues' and are being processed by the first consumer (events that were read/entered the first consumer before the new consumer appeared but are being processed or wait to processed within the 'old' consumer) and that means that there is a possibility that those events are being processed simultaneously by the two consumers (since the new consumer will start reading events that might be of id 'x' and that might be then processed in parallel with event ids 'x' in the old consumer)
If that is a possible scenario then when a new consumer is starting there should be some kind of 'consumers sync'
On Wed, Oct 31, 2012 at 4:57 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> Guy, > > This is really an issue with changing # of partitions. If # of partitions > changes for a topic, in the transition phase, messages used to be delivered > to the same partition could be delivered to different partitions and their > consumption ordering is non-deterministic (since ordered consumption is > only guaranteed within a partition). > > In 0.7, # of partitions increases as new brokers are added. In 0.8, # of > partitions is set at topic creation time and will stay the same when new > brokers are added. > > Thanks, > > Jun > > On Wed, Oct 31, 2012 at 4:12 AM, Guy Peleg <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > As I learn and plan to use Kafka, I'm concirned about possible race > > condition when brokers/consumers are added or removed. > > > > Say I have a topic that is devide into two partitions, where consumers > are > > deviding the mssages between those two partitions by ,say, modulo > event-id, > > where events with the same event ids should be processed by the order of > > their arrival, that will work since as I said, I will devide the incoming > > events by their event-id % number_of_partitions > > > > Now, when a new paratition is added, there might be situations where > events > > with event-id 'x', will still be in the first broker, while new ones, > with > > event-id 'x', are added to the new paratition > > which may result in those events being processed in parallel, what am i > > missing? > > > > Thanks, > > > > Guy > > >
-
Re: A question about possible race condition
Jun Rao 2012-11-01, 14:56
Guy,
Yes, this is possible. One solution that we have been thinking about is that if a rebalance happens, each consumer can somehow get a callback that indicates the set of partitions being consumed may have changed. Will this address your concern?
Thanks,
Jun
On Thu, Nov 1, 2012 at 12:10 AM, Guy Peleg <[EMAIL PROTECTED]> wrote:
> One more possible race might happen when the partition number is fixed but > consumer(s) are added/removed > For example: If I have a consumer reading data from two partitions > (partition one and partition two), and a new consumer is added, the result > will be that each consumer will consume from one partition > let's say that the 'old' consumer will continue with partition one while > the new consumer will process the data from partition two > > but, suppose that partition two held events that belong to event id 'x', > and that partition is now consumed by the new consumer, > Since consumers might reside on different machines and they are possibly > multithreaded processes, there might be a situation that other event ids > 'x' are already 'in the internal queues' and are being processed > by the first consumer (events that were read/entered the first consumer > before the new consumer appeared but are being processed or wait to > processed within the 'old' consumer) and that means that there is a > possibility that those events are being processed simultaneously by the two > consumers (since the new consumer will start reading events that might be > of id 'x' and that might be then processed in parallel with event ids 'x' > in the old consumer) > > If that is a possible scenario then when a new consumer is starting there > should be some kind of 'consumers sync' > > > > > > On Wed, Oct 31, 2012 at 4:57 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > Guy, > > > > This is really an issue with changing # of partitions. If # of partitions > > changes for a topic, in the transition phase, messages used to be > delivered > > to the same partition could be delivered to different partitions and > their > > consumption ordering is non-deterministic (since ordered consumption is > > only guaranteed within a partition). > > > > In 0.7, # of partitions increases as new brokers are added. In 0.8, # of > > partitions is set at topic creation time and will stay the same when new > > brokers are added. > > > > Thanks, > > > > Jun > > > > On Wed, Oct 31, 2012 at 4:12 AM, Guy Peleg <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > > > > As I learn and plan to use Kafka, I'm concirned about possible race > > > condition when brokers/consumers are added or removed. > > > > > > Say I have a topic that is devide into two partitions, where consumers > > are > > > deviding the mssages between those two partitions by ,say, modulo > > event-id, > > > where events with the same event ids should be processed by the order > of > > > their arrival, that will work since as I said, I will devide the > incoming > > > events by their event-id % number_of_partitions > > > > > > Now, when a new paratition is added, there might be situations where > > events > > > with event-id 'x', will still be in the first broker, while new ones, > > with > > > event-id 'x', are added to the new paratition > > > which may result in those events being processed in parallel, what am i > > > missing? > > > > > > Thanks, > > > > > > Guy > > > > > >
-
Re: A question about possible race condition
Guy Peleg 2012-11-02, 10:38
Jun,
I'm not sure that's enough.
A callback may not be enough since we can't be sure that there are no events from that partition being processed while the new consumer starts processing events from that partition.
I think that a consumer should be handed a partition only after we're sure there is no other consumer that is reading or *processing *events from that partition.
The only way to achieve that is, I think, by some kind of acknowledgment from the consumer side that it is ready to give up the partition (e.g. after gracefully stopped working internally with those events)
I know that means that there is a need to consider the extreme cases here, but still I think we can't do without the consumer's 'acknoledment' without being subject to race scenarios.
Thanks,
Guy On Thu, Nov 1, 2012 at 4:56 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> Guy, > > Yes, this is possible. One solution that we have been thinking about is > that if a rebalance happens, each consumer can somehow get a callback that > indicates the set of partitions being consumed may have changed. Will this > address your concern? > > Thanks, > > Jun > > On Thu, Nov 1, 2012 at 12:10 AM, Guy Peleg <[EMAIL PROTECTED]> wrote: > > > One more possible race might happen when the partition number is fixed > but > > consumer(s) are added/removed > > For example: If I have a consumer reading data from two partitions > > (partition one and partition two), and a new consumer is added, the > result > > will be that each consumer will consume from one partition > > let's say that the 'old' consumer will continue with partition one while > > the new consumer will process the data from partition two > > > > but, suppose that partition two held events that belong to event id 'x', > > and that partition is now consumed by the new consumer, > > Since consumers might reside on different machines and they are possibly > > multithreaded processes, there might be a situation that other event ids > > 'x' are already 'in the internal queues' and are being processed > > by the first consumer (events that were read/entered the first consumer > > before the new consumer appeared but are being processed or wait to > > processed within the 'old' consumer) and that means that there is a > > possibility that those events are being processed simultaneously by the > two > > consumers (since the new consumer will start reading events that might be > > of id 'x' and that might be then processed in parallel with event ids 'x' > > in the old consumer) > > > > If that is a possible scenario then when a new consumer is starting there > > should be some kind of 'consumers sync' > > > > > > > > > > > > On Wed, Oct 31, 2012 at 4:57 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > > > Guy, > > > > > > This is really an issue with changing # of partitions. If # of > partitions > > > changes for a topic, in the transition phase, messages used to be > > delivered > > > to the same partition could be delivered to different partitions and > > their > > > consumption ordering is non-deterministic (since ordered consumption is > > > only guaranteed within a partition). > > > > > > In 0.7, # of partitions increases as new brokers are added. In 0.8, # > of > > > partitions is set at topic creation time and will stay the same when > new > > > brokers are added. > > > > > > Thanks, > > > > > > Jun > > > > > > On Wed, Oct 31, 2012 at 4:12 AM, Guy Peleg <[EMAIL PROTECTED]> > wrote: > > > > > > > Hi, > > > > > > > > As I learn and plan to use Kafka, I'm concirned about possible race > > > > condition when brokers/consumers are added or removed. > > > > > > > > Say I have a topic that is devide into two partitions, where > consumers > > > are > > > > deviding the mssages between those two partitions by ,say, modulo > > > event-id, > > > > where events with the same event ids should be processed by the order > > of > > > > their arrival, that will work since as I said, I will devide the > > incoming > > > > events by their event-id % number_of_partitions
-
Re: A question about possible race condition
Guy Peleg 2012-11-02, 14:17
Jun,
On that note, waiting to consumer's acknowledgment can be with configurable timeout ranging from blocking to not wait at all
Thanks,
Guy
On Fri, Nov 2, 2012 at 12:38 PM, Guy Peleg <[EMAIL PROTECTED]> wrote:
> Jun, > > I'm not sure that's enough. > > A callback may not be enough since we can't be sure that there are no > events from that partition being processed while the new consumer starts > processing events from that partition. > > I think that a consumer should be handed a partition only after we're sure > there is no other consumer that is reading or *processing *events from > that partition. > > The only way to achieve that is, I think, by some kind of acknowledgment > from the consumer side that it is ready to give up the partition (e.g. > after gracefully stopped working internally with those events) > > I know that means that there is a need to consider the extreme cases here, > but still I think we can't do without the consumer's 'acknoledment' without > being subject to race scenarios. > > Thanks, > > Guy > > > > > On Thu, Nov 1, 2012 at 4:56 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> Guy, >> >> Yes, this is possible. One solution that we have been thinking about is >> that if a rebalance happens, each consumer can somehow get a callback that >> indicates the set of partitions being consumed may have changed. Will this >> address your concern? >> >> Thanks, >> >> Jun >> >> On Thu, Nov 1, 2012 at 12:10 AM, Guy Peleg <[EMAIL PROTECTED]> wrote: >> >> > One more possible race might happen when the partition number is fixed >> but >> > consumer(s) are added/removed >> > For example: If I have a consumer reading data from two partitions >> > (partition one and partition two), and a new consumer is added, the >> result >> > will be that each consumer will consume from one partition >> > let's say that the 'old' consumer will continue with partition one while >> > the new consumer will process the data from partition two >> > >> > but, suppose that partition two held events that belong to event id 'x', >> > and that partition is now consumed by the new consumer, >> > Since consumers might reside on different machines and they are possibly >> > multithreaded processes, there might be a situation that other event ids >> > 'x' are already 'in the internal queues' and are being processed >> > by the first consumer (events that were read/entered the first consumer >> > before the new consumer appeared but are being processed or wait to >> > processed within the 'old' consumer) and that means that there is a >> > possibility that those events are being processed simultaneously by the >> two >> > consumers (since the new consumer will start reading events that might >> be >> > of id 'x' and that might be then processed in parallel with event ids >> 'x' >> > in the old consumer) >> > >> > If that is a possible scenario then when a new consumer is starting >> there >> > should be some kind of 'consumers sync' >> > >> > >> > >> > >> > >> > On Wed, Oct 31, 2012 at 4:57 PM, Jun Rao <[EMAIL PROTECTED]> wrote: >> > >> > > Guy, >> > > >> > > This is really an issue with changing # of partitions. If # of >> partitions >> > > changes for a topic, in the transition phase, messages used to be >> > delivered >> > > to the same partition could be delivered to different partitions and >> > their >> > > consumption ordering is non-deterministic (since ordered consumption >> is >> > > only guaranteed within a partition). >> > > >> > > In 0.7, # of partitions increases as new brokers are added. In 0.8, # >> of >> > > partitions is set at topic creation time and will stay the same when >> new >> > > brokers are added. >> > > >> > > Thanks, >> > > >> > > Jun >> > > >> > > On Wed, Oct 31, 2012 at 4:12 AM, Guy Peleg <[EMAIL PROTECTED]> >> wrote: >> > > >> > > > Hi, >> > > > >> > > > As I learn and plan to use Kafka, I'm concirned about possible race >> > > > condition when brokers/consumers are added or removed. >> > > > >> > > > Say I have a topic that is devide into two partitions, where
-
Re: A question about possible race condition
Jun Rao 2012-11-02, 14:54
Then, you may be interested in our consume client redesign: https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-DesignThanks, Jun On Fri, Nov 2, 2012 at 7:17 AM, Guy Peleg <[EMAIL PROTECTED]> wrote: > Jun, > > On that note, waiting to consumer's acknowledgment can be with configurable > timeout ranging from blocking to not wait at all > > Thanks, > > Guy > > > > On Fri, Nov 2, 2012 at 12:38 PM, Guy Peleg <[EMAIL PROTECTED]> wrote: > > > Jun, > > > > I'm not sure that's enough. > > > > A callback may not be enough since we can't be sure that there are no > > events from that partition being processed while the new consumer starts > > processing events from that partition. > > > > I think that a consumer should be handed a partition only after we're > sure > > there is no other consumer that is reading or *processing *events from > > that partition. > > > > The only way to achieve that is, I think, by some kind of acknowledgment > > from the consumer side that it is ready to give up the partition (e.g. > > after gracefully stopped working internally with those events) > > > > I know that means that there is a need to consider the extreme cases > here, > > but still I think we can't do without the consumer's 'acknoledment' > without > > being subject to race scenarios. > > > > Thanks, > > > > Guy > > > > > > > > > > On Thu, Nov 1, 2012 at 4:56 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > >> Guy, > >> > >> Yes, this is possible. One solution that we have been thinking about is > >> that if a rebalance happens, each consumer can somehow get a callback > that > >> indicates the set of partitions being consumed may have changed. Will > this > >> address your concern? > >> > >> Thanks, > >> > >> Jun > >> > >> On Thu, Nov 1, 2012 at 12:10 AM, Guy Peleg <[EMAIL PROTECTED]> wrote: > >> > >> > One more possible race might happen when the partition number is fixed > >> but > >> > consumer(s) are added/removed > >> > For example: If I have a consumer reading data from two partitions > >> > (partition one and partition two), and a new consumer is added, the > >> result > >> > will be that each consumer will consume from one partition > >> > let's say that the 'old' consumer will continue with partition one > while > >> > the new consumer will process the data from partition two > >> > > >> > but, suppose that partition two held events that belong to event id > 'x', > >> > and that partition is now consumed by the new consumer, > >> > Since consumers might reside on different machines and they are > possibly > >> > multithreaded processes, there might be a situation that other event > ids > >> > 'x' are already 'in the internal queues' and are being processed > >> > by the first consumer (events that were read/entered the first > consumer > >> > before the new consumer appeared but are being processed or wait to > >> > processed within the 'old' consumer) and that means that there is a > >> > possibility that those events are being processed simultaneously by > the > >> two > >> > consumers (since the new consumer will start reading events that might > >> be > >> > of id 'x' and that might be then processed in parallel with event ids > >> 'x' > >> > in the old consumer) > >> > > >> > If that is a possible scenario then when a new consumer is starting > >> there > >> > should be some kind of 'consumers sync' > >> > > >> > > >> > > >> > > >> > > >> > On Wed, Oct 31, 2012 at 4:57 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> > > >> > > Guy, > >> > > > >> > > This is really an issue with changing # of partitions. If # of > >> partitions > >> > > changes for a topic, in the transition phase, messages used to be > >> > delivered > >> > > to the same partition could be delivered to different partitions and > >> > their > >> > > consumption ordering is non-deterministic (since ordered consumption > >> is > >> > > only guaranteed within a partition). > >> > > > >> > > In 0.7, # of partitions increases as new brokers are added. In 0.8,
|
|