Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Re: a few questions from high level consumer documentation.


Copy link to this message
-
Re: a few questions from high level consumer documentation.
I'll try to answer some, the Kafka team will need to answer the others:
On Wed, May 8, 2013 at 12:17 PM, Yu, Libo <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I read this link
> https://cwiki.apache.org/KAFKA/consumer-group-example.html
> and have a few questions (if not too many).
>
> 1 When you say the iterator may block, do you mean hasNext() may block?
>

Yes.
>
> 2 "Remember, you can only use a single process per Consumer Group."
>     Do you mean we can only use a single process on one node of the
> cluster for a consumer group?
>     Or there can be only one process on the whole cluster for a consumer
> group? Please clarify on this.
>
> Bug. I'll change it. When I wrote this I mis-understood the re-balancing
step. I missed this reference but fixed the others. Sorry

> 3 Why save offset to zookeeper? Is it easier to save it to a local file?
>
> 4 When client exits/crashes or leader for a partition is changed,
> duplicate messages may be replayed. "To help avoid this (replayed duplicate
> messages), make sure you provide a clean way for your client to exit
> instead of assuming it can be 'kill -9'd."
>
> a.       For client exit, if the client is receiving data at the time, how
> to do a clean exit? How can client tell consumer to write offset to
> zookeepr before exiting?
>

If you call the shutdown() method on the Consumer it will cleanly stop,
releasing any blocked iterators. In the example it goes to sleep for a few
seconds then cleanly shuts down.
>
>
> b.      For client crash, what can client do to avoid duplicate messages
> when restarted? What I can think of is to read last message from log file
> and ignore the first few received duplicate messages until receiving the
> last read message. But is it possible for client to read log file directly?
>

If you can't tolerate the possibility of duplicates you need to look at the
Simple Consumer example, There you control the offset storage.
>
>
> c.       For the change of the partition leader, is there anything that
> clients can do to avoid duplicates?
>
> Thanks.
>
>
>
> Libo
>
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB