Kafka, mail # user - RE: consuming only half the messages produced - 2013-05-02, 13:20
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
-
RE: consuming only half the messages produced
Yes, I mean we can only consume half the messages produced.  I followed the
high-level consumer example here:
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example.

Let me give a more complete scenario:

- We run 3 zookeepers
- We run 2 brokers
- We do not have a topic defined, but we have enabled topic auto-creation
(with a replication factor of 2? must check this)
- We connect the producer to both brokers (pocmsg5:9092,pocmsg6:9092)
- We stuff the topic into the KeyedMessage key with no Partitioner.  I was
not aware of the use of the key until last night.
- We generate 10 messages
- Topic auto-creation results in the following partitions:
        topic: unittest-test-msg        partition: 0    leader: 0
replicas: 0     isr: 0
        topic: unittest-test-msg        partition: 1    leader: 1
replicas: 1     isr: 1
        topic: unittest-test-msg        partition: 2    leader: 0
replicas: 0     isr: 0
        topic: unittest-test-msg        partition: 3    leader: 1
replicas: 1     isr: 1
- We construct a single Kafka stream by calling createStreams with a
zookeeper (pocmsg5:2181) and one thread
        public <K,V> Map<String, List<KafkaStream<K,V>>>
createMessageStreams(
                        Map<String, Integer> topicCountMap,
                        Decoder<K> keyDecoder,
                        Decoder<V> valueDecoder)
- We consume only half the messages
- It looks as if partitions 0 and 2 are on pocmsg5, while partitions 1 and 3
are on pocmsg6.  

Is it best to view the situation as 2 partitions, each a leader, with a
replica follower for each?
which partitions are leaders and which are replicas?
What happened with auto-creation and production and partitioning?  
Which partition(s) is the zookeeper pointing the high-level consumer to read
from?

thanks,
rob

partitions
matter
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB