Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> consuming only half the messages produced


Copy link to this message
-
RE: consuming only half the messages produced
Yes, I mean we can only consume half the messages produced.  I followed the
high-level consumer example here:
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example.

Let me give a more complete scenario:

- We run 3 zookeepers
- We run 2 brokers
- We do not have a topic defined, but we have enabled topic auto-creation
(with a replication factor of 2? must check this)
- We connect the producer to both brokers (pocmsg5:9092,pocmsg6:9092)
- We stuff the topic into the KeyedMessage key with no Partitioner.  I was
not aware of the use of the key until last night.
- We generate 10 messages
- Topic auto-creation results in the following partitions:
        topic: unittest-test-msg        partition: 0    leader: 0
replicas: 0     isr: 0
        topic: unittest-test-msg        partition: 1    leader: 1
replicas: 1     isr: 1
        topic: unittest-test-msg        partition: 2    leader: 0
replicas: 0     isr: 0
        topic: unittest-test-msg        partition: 3    leader: 1
replicas: 1     isr: 1
- We construct a single Kafka stream by calling createStreams with a
zookeeper (pocmsg5:2181) and one thread
        public <K,V> Map<String, List<KafkaStream<K,V>>>
createMessageStreams(
                        Map<String, Integer> topicCountMap,
                        Decoder<K> keyDecoder,
                        Decoder<V> valueDecoder)
- We consume only half the messages
- It looks as if partitions 0 and 2 are on pocmsg5, while partitions 1 and 3
are on pocmsg6.  

Is it best to view the situation as 2 partitions, each a leader, with a
replica follower for each?
which partitions are leaders and which are replicas?
What happened with auto-creation and production and partitioning?  
Which partition(s) is the zookeeper pointing the high-level consumer to read
from?

thanks,
rob

> -----Original Message-----
> From: Jun Rao [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, May 01, 2013 11:15 PM
> To: [EMAIL PROTECTED]
> Subject: Re: consuming only half the messages produced
>
> Partition is different from replicas. A topic can have one or more
partitions
> and each partition can have one or more replicas. A consumer consumes data
> at partition level. In other words, a consumer gets the same data no
matter
> how many replicas are there.
>
> When you say the consumer only gets half of the messages, do you mean that
> it gets half of the messages that are produced?
>
> You may want to take a look at the consumer example in
> http://kafka.apache.org/08/api.html
>
> Thanks,
>
> Jun
>
>
> On Wed, May 1, 2013 at 7:14 PM, Rob Withers <[EMAIL PROTECTED]> wrote:
>
> > Running a consumer group (createStreams()), pointing to the zookeeper
> > and with the topic and 1 consumer thread, results in only half the
> > messages being consumed.  The topic was auto-created, with a
> > replication factor of 2, but the producer was configured to produce to
> > 2 brokers and so 4 partitions resulted.  Are half getting sent to one
> > leader, in one broker, and the other half getting sent to another
> > leader, in the other broker, but the consumer stream is only reading
> > from one leader from the zk?  Shouldn't there only be one leader?
> >
> >
> >
> > thanks,
> >
> > rob
> >
> >
> >
> >
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB