Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> Re: a few questions from high level consumer documentation.


+
Jun Rao 2013-05-09, 04:16
+
Chris Curtin 2013-05-08, 16:49
+
Rob Withers 2013-05-09, 04:37
+
Chris Curtin 2013-05-09, 12:28
+
David Arthur 2013-05-09, 14:29
Copy link to this message
-
Re: a few questions from high level consumer documentation.
Thanks,
Neha
On May 9, 2013 5:28 AM, "Chris Curtin" <[EMAIL PROTECTED]> wrote:
>
> On Thu, May 9, 2013 at 12:36 AM, Rob Withers <[EMAIL PROTECTED]> wrote:
>
> >
> >
> > > -----Original Message-----
> > > From: Chris Curtin [mailto:[EMAIL PROTECTED]]
> >
> > > > 1 When you say the iterator may block, do you mean hasNext() may
block?
> > > >
> > >
> > > Yes.
> >
> > Is this due to a potential non-blocking fetch (broker/zookeeper returns
an
> > empty block if offset is current)?  Yet this blocks the network call of
the
> > consumer iterator, do I have that right?  Are there other reasons it
could
> > block?  Like the call fails and a backup call is made?
> >
>
> I'll let the Kafka team answer this. I don't know the low level details.
>
It is because the consumer could be at the tail end and new data could
arrive at the server at a later time. The consumer is blocking by default
to handle a continuous stream of data.
> >
> > > > b.      For client crash, what can client do to avoid duplicate
> > messages
> > > > when restarted? What I can think of is to read last message from log
> > > > file and ignore the first few received duplicate messages until
> > > > receiving the last read message. But is it possible for client to
read
> > log file
> > > directly?
> > > >
> > >
> > > If you can't tolerate the possibility of duplicates you need to look
at
> > the
> > > Simple Consumer example, There you control the offset storage.
> >
> > Do you have example code that manages only once, even when a consumer
for a
> > given partition goes away?
> >
>
> No, but if you look at the Simple Consumer example where the read occurs
> (and the write to System.out) at that point you know the offset you just
> read, so you need to put it somewhere. Using the Simple Consumer Kafka
> leaves all the offset management to you.
>
>
> >
> > What does happen with rebalancing when a consumer goes away?
>
>
> Hmm, I can't find the link to the algorithm right now. Jun or Neha can
you?

You can find the algorithm on the design page.
http://kafka.apache.org/07/design.html

>
> > Is this
> > behavior of the high-level consumer group?
>
>
> Yes.
>
>
> > Is there a way to supply one's
> > own simple consumer with only once, within a consumer group that
> > rebalances?
> >
> No. Simple Consumers don't have rebalancing steps. Basically you take
> control of what is requested from which topics and partitions. So you
could
> ask for a specific offset in a topic/partition 100 times in a row and
Kafka
> will happily return it to you. Nothing is written to ZooKeeper either, you
> control everything.
>
>
>
> >
> > What happens if a producer goes away?
> >
>
> Shouldn't matter to the consumers. The Brokers are what the consumers talk
> to, so if nothing is writing the Broker won't have anything to send.
>
> >
> > thanks much,
> > rob
> >
> >
> >

 
+
Jun Rao 2013-05-14, 03:51
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB