Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Re: a few questions from high level consumer documentation.


Copy link to this message
-
Re: a few questions from high level consumer documentation.

On 5/9/13 8:27 AM, Chris Curtin wrote:
> On Thu, May 9, 2013 at 12:36 AM, Rob Withers <[EMAIL PROTECTED]> wrote:
>
>>
>>> -----Original Message-----
>>> From: Chris Curtin [mailto:[EMAIL PROTECTED]]
>>>> 1 When you say the iterator may block, do you mean hasNext() may block?
>>>>
>>> Yes.
>> Is this due to a potential non-blocking fetch (broker/zookeeper returns an
>> empty block if offset is current)?  Yet this blocks the network call of the
>> consumer iterator, do I have that right?  Are there other reasons it could
>> block?  Like the call fails and a backup call is made?
>>
> I'll let the Kafka team answer this. I don't know the low level details.
The iterator will block if there is no more data to consume. The
iterator is actually reading messages from a BlockingQueue which is fed
messages by the fetcher threads. The reason for this is to allow you to
configure blocking with or without a timeout in the ConsumerIterator.
This is reflected in the consumer timeout property (consumer.timeout.ms)
>
>
>>>> b.      For client crash, what can client do to avoid duplicate
>> messages
>>>> when restarted? What I can think of is to read last message from log
>>>> file and ignore the first few received duplicate messages until
>>>> receiving the last read message. But is it possible for client to read
>> log file
>>> directly?
>>> If you can't tolerate the possibility of duplicates you need to look at
>> the
>>> Simple Consumer example, There you control the offset storage.
>> Do you have example code that manages only once, even when a consumer for a
>> given partition goes away?
>>
> No, but if you look at the Simple Consumer example where the read occurs
> (and the write to System.out) at that point you know the offset you just
> read, so you need to put it somewhere. Using the Simple Consumer Kafka
> leaves all the offset management to you.
>
>
>> What does happen with rebalancing when a consumer goes away?
>
> Hmm, I can't find the link to the algorithm right now. Jun or Neha can you?
Down at the bottom of the 0.7 design page
http://kafka.apache.org/07/design.html
>
>
>> Is this
>> behavior of the high-level consumer group?
>
> Yes.
>
>
>> Is there a way to supply one's
>> own simple consumer with only once, within a consumer group that
>> rebalances?
>>
> No. Simple Consumers don't have rebalancing steps. Basically you take
> control of what is requested from which topics and partitions. So you could
> ask for a specific offset in a topic/partition 100 times in a row and Kafka
> will happily return it to you. Nothing is written to ZooKeeper either, you
> control everything.
>
>
>
>> What happens if a producer goes away?
>>
> Shouldn't matter to the consumers. The Brokers are what the consumers talk
> to, so if nothing is writing the Broker won't have anything to send.
>
>> thanks much,
>> rob
>>
>>
>>
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB