Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> error recovery in multiple thread reading from Kafka with HighLevel api


Copy link to this message
-
Re: error recovery in multiple thread reading from Kafka with HighLevel api
Hello Chen,

1. Manually commit offsets does have the risk of duplicates, consider the
following pattern:

message = consumer.next();
process(message);
consumer.commit();

the rebalance can happen between line 2 and 3, where the message has been
processed but offset not being committed, if another consumer picks up this
partition after the rebalance, it may re-consume this message again. With
auto.commit turned on, offsets will always be committed before the
consumers release ownership of partitions during rebalances.

In the 0.9 consumer design, we have fixed this issue by introducing the
onPartitionDeassigned callback, you can take a look at its current API here:

http://people.apache.org/~nehanarkhede/kafka-0.9-producer-javadoc/doc/org/apache/kafka/clients/consumer/KafkaConsumer.html

2. Commit offsets too often does have an overhead since it is going to
Zookeeper, and ZK is not write-scalable. We are also fixing that issue by
moving the offset management from ZK to kafka servers. This is already
checked in trunk, and will be included in 0.8.2 release.

Guozhang
On Thu, Aug 7, 2014 at 5:36 PM, Chen Wang <[EMAIL PROTECTED]>
wrote: