Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> custom kafka consumer - strangeness


Copy link to this message
-
Re: custom kafka consumer - strangeness
If you look at the example simple consumer:
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example

You'll see:

  if (currentOffset < readOffset) {
        System.out.println("Found an old offset: " + currentOffset + "
Expecting: " + readOffset);
        continue;
    }

and a comment in the 'Reading the Data' part:

Also note that we are explicitly checking that the offset being read is not
less than the offset that we requested. This is needed since if Kafka is
compressing the messages, the fetch request will return an entire
compressed block even if the requested offset isn't the beginning of the
compressed block. Thus a message we saw previously may be returned again.

This is probably what is happening to you

Chris
On Thu, Jan 9, 2014 at 4:00 PM, Gerrit Jansen van Vuuren <
[EMAIL PROTECTED]> wrote:

> Hi,
>
> I'm writing a custom consumer for kafka 0.8.
> Everything works except for the following:
>
> a. connect, send fetch, read all results
> b. send fetch
> c. send fetch
> d. send fetch
> e. via the console publisher, publish 2 messages
> f. send fetch :corr-id 1
> g. read 2 messages published :offsets [10 11] :corr-id 1
> h. send fetch :corr-id 2
> i. read 2 messages published :offsets [10 11] :corr-id 2
> j.  send fetch ...
>
> The problem is I get the messages sent twice as a response to two separate
> fetch requests. The correlation id is distinct so it cannot be that I read
> the response twice. The offsets of the 2 messages are are the same so they
> are duplicates, and its not the producer sending the messages twice.
>
> Note: the same connection is kept open the whole time, and I send
> block,receive then send again, after the first 2 messages are read, the
> offsets are incremented and the next fetch will ask kafka to give it
> messages from the new offsets.
>
> any ideas of why kafka would be sending the messages again on the second
> fetch request?
>
> Regards,
>  Gerrit
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB