|
|
-
Duplicate messages after new consumer introduction
navneet sharma 2012-05-16, 09:41
Hi,
I tried a scenario wherein: 1) i had 1 producer and 3 consumers subscribed for a topic - "cartTopic", all in same group. 2) Now, when everything is executing, i introduced another consumer for the same topic and in the same group. So, overall there are 4 consumers. 3) Ofcourse, it triggered re-balancing.
But then final result is that few messages are duplicated. In my example run, producer sent 800,000 records, but consumer received 801,448 records. I am using log4j to generate the output file.
Is there any reasons for duplicacy?
Thanks, Navneet Sharma
-
Re: Duplicate messages after new consumer introduction
Jun Rao 2012-05-16, 14:54
In trunk, the # of dups introduced during rebalance is significantly reduced. We used to replay the last chunk of fetched messages during rebalance. In trunk, there is at most 1 duplicated message per partition during rebalance (assuming messages are not compressed).
Jun
On Wed, May 16, 2012 at 2:41 AM, navneet sharma <[EMAIL PROTECTED] > wrote:
> Hi, > > I tried a scenario wherein: > 1) i had 1 producer and 3 consumers subscribed for a topic - "cartTopic", > all in same group. > 2) Now, when everything is executing, i introduced another consumer for the > same topic and in the same group. So, overall there are 4 consumers. > 3) Ofcourse, it triggered re-balancing. > > But then final result is that few messages are duplicated. > In my example run, producer sent 800,000 records, but consumer received > 801,448 records. > I am using log4j to generate the output file. > > Is there any reasons for duplicacy? > > Thanks, > Navneet Sharma >
-
Re: Duplicate messages after new consumer introduction
Jay Kreps 2012-05-16, 16:30
Technically this is the guarantee we provide--at least once delivery. It is very expensive to completely eliminate this possibility in the general case as you need to co-ordinate any state changes the consumer makes with committing the offset that marks the position. But we have improved the common cases for normal rebalancing so if you are using trunk the only time this would happen is when there is a hard crash of a process.
-Jay
On Wed, May 16, 2012 at 2:41 AM, navneet sharma <[EMAIL PROTECTED]> wrote: > Hi, > > I tried a scenario wherein: > 1) i had 1 producer and 3 consumers subscribed for a topic - "cartTopic", > all in same group. > 2) Now, when everything is executing, i introduced another consumer for the > same topic and in the same group. So, overall there are 4 consumers. > 3) Ofcourse, it triggered re-balancing. > > But then final result is that few messages are duplicated. > In my example run, producer sent 800,000 records, but consumer received > 801,448 records. > I am using log4j to generate the output file. > > Is there any reasons for duplicacy? > > Thanks, > Navneet Sharma
-
Re: Duplicate messages after new consumer introduction
navneet sharma 2012-05-17, 06:45
I downloaded the tar from the download link provided in quickstart page. Almost more than a month back.
I trunk maintaining different code than the tar?
Can number of partitions cause this problem, beacuse i am using 2 partitions on each of the two brokers.?
Thanks, Navneet Sharma
On Wed, May 16, 2012 at 10:00 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> Technically this is the guarantee we provide--at least once delivery. > It is very expensive to completely eliminate this possibility in the > general case as you need to co-ordinate any state changes the consumer > makes with committing the offset that marks the position. But we have > improved the common cases for normal rebalancing so if you are using > trunk the only time this would happen is when there is a hard crash of > a process. > > -Jay > > On Wed, May 16, 2012 at 2:41 AM, navneet sharma > <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I tried a scenario wherein: > > 1) i had 1 producer and 3 consumers subscribed for a topic - "cartTopic", > > all in same group. > > 2) Now, when everything is executing, i introduced another consumer for > the > > same topic and in the same group. So, overall there are 4 consumers. > > 3) Ofcourse, it triggered re-balancing. > > > > But then final result is that few messages are duplicated. > > In my example run, producer sent 800,000 records, but consumer received > > 801,448 records. > > I am using log4j to generate the output file. > > > > Is there any reasons for duplicacy? > > > > Thanks, > > Navneet Sharma >
-
Re: Duplicate messages after new consumer introduction
Jun Rao 2012-05-17, 14:26
Trunk is newer than the 0.7.0 jar. During rebalance, dups are introduced per partition. So, the more the # of partitions, the more dups.
Jun
On Wed, May 16, 2012 at 11:45 PM, navneet sharma < [EMAIL PROTECTED]> wrote:
> I downloaded the tar from the download link provided in quickstart page. > Almost more than a month back. > > I trunk maintaining different code than the tar? > > Can number of partitions cause this problem, beacuse i am using 2 > partitions on each of the two brokers.? > > Thanks, > Navneet Sharma > > On Wed, May 16, 2012 at 10:00 PM, Jay Kreps <[EMAIL PROTECTED]> wrote: > > > Technically this is the guarantee we provide--at least once delivery. > > It is very expensive to completely eliminate this possibility in the > > general case as you need to co-ordinate any state changes the consumer > > makes with committing the offset that marks the position. But we have > > improved the common cases for normal rebalancing so if you are using > > trunk the only time this would happen is when there is a hard crash of > > a process. > > > > -Jay > > > > On Wed, May 16, 2012 at 2:41 AM, navneet sharma > > <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > I tried a scenario wherein: > > > 1) i had 1 producer and 3 consumers subscribed for a topic - > "cartTopic", > > > all in same group. > > > 2) Now, when everything is executing, i introduced another consumer for > > the > > > same topic and in the same group. So, overall there are 4 consumers. > > > 3) Ofcourse, it triggered re-balancing. > > > > > > But then final result is that few messages are duplicated. > > > In my example run, producer sent 800,000 records, but consumer received > > > 801,448 records. > > > I am using log4j to generate the output file. > > > > > > Is there any reasons for duplicacy? > > > > > > Thanks, > > > Navneet Sharma > > >
-
Re: Duplicate messages after new consumer introduction
Jay Kreps 2012-05-17, 15:06
Yes the code is slightly ahead of that release. Svn log will give the change list. The change we made improved the rebalancing protocol so that in non-failure cases there is no duplication. The duplication isn't really a problem per se--essentially all messaging systems either give "at most once" or "at least once" semantics, we are the former. In the event of a hard kill of a consumer process you will still see some duplicate messages as the process that takes over the partitions from the now-killed consumer will start from the last commit point.
-Jay
On Wed, May 16, 2012 at 11:45 PM, navneet sharma <[EMAIL PROTECTED]> wrote: > I downloaded the tar from the download link provided in quickstart page. > Almost more than a month back. > > I trunk maintaining different code than the tar? > > Can number of partitions cause this problem, beacuse i am using 2 > partitions on each of the two brokers.? > > Thanks, > Navneet Sharma > > On Wed, May 16, 2012 at 10:00 PM, Jay Kreps <[EMAIL PROTECTED]> wrote: > >> Technically this is the guarantee we provide--at least once delivery. >> It is very expensive to completely eliminate this possibility in the >> general case as you need to co-ordinate any state changes the consumer >> makes with committing the offset that marks the position. But we have >> improved the common cases for normal rebalancing so if you are using >> trunk the only time this would happen is when there is a hard crash of >> a process. >> >> -Jay >> >> On Wed, May 16, 2012 at 2:41 AM, navneet sharma >> <[EMAIL PROTECTED]> wrote: >> > Hi, >> > >> > I tried a scenario wherein: >> > 1) i had 1 producer and 3 consumers subscribed for a topic - "cartTopic", >> > all in same group. >> > 2) Now, when everything is executing, i introduced another consumer for >> the >> > same topic and in the same group. So, overall there are 4 consumers. >> > 3) Ofcourse, it triggered re-balancing. >> > >> > But then final result is that few messages are duplicated. >> > In my example run, producer sent 800,000 records, but consumer received >> > 801,448 records. >> > I am using log4j to generate the output file. >> > >> > Is there any reasons for duplicacy? >> > >> > Thanks, >> > Navneet Sharma >>
|
|