1. Manually commit offsets does have the risk of duplicates, consider the
message = consumer.next();
the rebalance can happen between line 2 and 3, where the message has been
processed but offset not being committed, if another consumer picks up this
partition after the rebalance, it may re-consume this message again. With
auto.commit turned on, offsets will always be committed before the
consumers release ownership of partitions during rebalances.
In the 0.9 consumer design, we have fixed this issue by introducing the
onPartitionDeassigned callback, you can take a look at its current API here:http://people.apache.org/~nehanarkhede/kafka-0.9-producer-javadoc/doc/org/apache/kafka/clients/consumer/KafkaConsumer.html
2. Commit offsets too often does have an overhead since it is going to
Zookeeper, and ZK is not write-scalable. We are also fixing that issue by
moving the offset management from ZK to kafka servers. This is already
checked in trunk, and will be included in 0.8.2 release.
On Thu, Aug 7, 2014 at 5:36 PM, Chen Wang <[EMAIL PROTECTED]>