Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # user - Re: Duplicate Messages on the Consumer


+
Philip OToole 2013-07-18, 19:29
Copy link to this message
-
Re: Duplicate Messages on the Consumer
Jun Rao 2013-07-22, 05:35
In 0.7.x, if the messages are compressed, there could be duplicated
messages during consumer rebalance. This is because we can only checkpoint
consumer offset at the compressed unit boundary. You may want to see if you
have unnecessary rebalances (see
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog%3F).
In 0.8, there won't be duplicated messages even when compression is enabled.

Thanks,

Jun
On Fri, Jul 19, 2013 at 1:16 PM, Sybrandy, Casey <
[EMAIL PROTECTED]> wrote:

> Hello,
>
> No, we couldn't check the broker logs because the data is obfuscated, so
> we can't just look at the files and tell.  It looks like our dev system may
> be experiencing the same issue, so I did turn of the obfuscation and we'll
> monitor it.  However, on our production system where we were seeing the
> errors more often, appears to have had zookeeper misconfigured, so we're
> thinking that may be the issue.
>
> Casey
>
> -----Original Message-----
> From: Philip O'Toole [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, July 18, 2013 3:29 PM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Re: Duplicate Messages on the Consumer
>
> Have you actually examined the Kafka files on disk, to make sure those
> dupes are really there? Or is this a case of reading the same message more
> than once?
>
> Philip
>
> On Thu, Jul 18, 2013 at 8:55 AM, Sybrandy, Casey <
> [EMAIL PROTECTED]> wrote:
> > Hello,
> >
> > We recently started seeing duplicate messages appearing at our
> consumers.  Thankfully, the database is set up so that we don't store the
> dupes, but it is annoying.  It's not every message, only about 1% of them.
>  We are running 0.7.0 for the broker with Zookeeper 3.3.4 from Cloudera and
> 0.7.0 for the producer and consumer.  We tried upgrading the consumer to
> 0.7.2 to see if that worked, but we're still seeing the dupes.  Do we have
> to upgrade the broker as well to resolve this?  Is there something we can
> check to see what's going on because we're not seeing anything unusual in
> the logs.  I suspected that there may be significant rebalancing, but that
> does not appear to be the case at all.
> >
> > Casey Sybrandy
> >
>