Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> consuming only half the messages produced


Copy link to this message
-
RE: consuming only half the messages produced
Regarding the partitions created:
>         topic: unittest-test-msg        partition: 0    leader: 0
replicas: 0     isr: 0
>         topic: unittest-test-msg        partition: 1    leader: 1
replicas: 1     isr: 1
>         topic: unittest-test-msg        partition: 2    leader: 0
replicas: 0     isr: 0
>         topic: unittest-test-msg        partition: 3    leader: 1
replicas: 1     isr: 1

Is this saying that partitions 1 and 3, both on the same broker, are both
the Leader partitions AND the Replica partitions?  I do not understand the
significance of ISR, either.   How should this be read?

Keeping in mind that these were auto-created, unsure of the implications, I
will be carefully reading the following, next:
https://cwiki.apache.org/KAFKA/kafka-replication.html.

thanks,
rob

> -----Original Message-----
> From: Rob Withers [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, May 02, 2013 7:20 AM
> To: '[EMAIL PROTECTED]'
> Subject: RE: consuming only half the messages produced
>
> Yes, I mean we can only consume half the messages produced.  I followed
the
> high-level consumer example here:
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Examp
> le.
>
> Let me give a more complete scenario:
>
> - We run 3 zookeepers
> - We run 2 brokers
> - We do not have a topic defined, but we have enabled topic auto-creation
> (with a replication factor of 2? must check this)
> - We connect the producer to both brokers (pocmsg5:9092,pocmsg6:9092)
> - We stuff the topic into the KeyedMessage key with no Partitioner.  I was
not
> aware of the use of the key until last night.
> - We generate 10 messages
> - Topic auto-creation results in the following partitions:
>         topic: unittest-test-msg        partition: 0    leader: 0
replicas: 0     isr: 0
>         topic: unittest-test-msg        partition: 1    leader: 1
replicas: 1     isr: 1
>         topic: unittest-test-msg        partition: 2    leader: 0
replicas: 0     isr: 0
>         topic: unittest-test-msg        partition: 3    leader: 1
replicas: 1     isr: 1
> - We construct a single Kafka stream by calling createStreams with a
> zookeeper (pocmsg5:2181) and one thread
>         public <K,V> Map<String, List<KafkaStream<K,V>>>
> createMessageStreams(
>                         Map<String, Integer> topicCountMap,
>                         Decoder<K> keyDecoder,
>                         Decoder<V> valueDecoder)
> - We consume only half the messages
> - It looks as if partitions 0 and 2 are on pocmsg5, while partitions 1 and
3 are
> on pocmsg6.
>
> Is it best to view the situation as 2 partitions, each a leader, with a
replica
> follower for each?
> which partitions are leaders and which are replicas?
> What happened with auto-creation and production and partitioning?
> Which partition(s) is the zookeeper pointing the high-level consumer to
read
> from?
>
> thanks,
> rob
>
> > -----Original Message-----
> > From: Jun Rao [mailto:[EMAIL PROTECTED]]
> > Sent: Wednesday, May 01, 2013 11:15 PM
> > To: [EMAIL PROTECTED]
> > Subject: Re: consuming only half the messages produced
> >
> > Partition is different from replicas. A topic can have one or more
> > partitions and each partition can have one or more replicas. A
> > consumer consumes data at partition level. In other words, a consumer
> > gets the same data no matter how many replicas are there.
> >
> > When you say the consumer only gets half of the messages, do you mean
> > that it gets half of the messages that are produced?
> >
> > You may want to take a look at the consumer example in
> > http://kafka.apache.org/08/api.html
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, May 1, 2013 at 7:14 PM, Rob Withers <[EMAIL PROTECTED]>
> wrote:
> >
> > > Running a consumer group (createStreams()), pointing to the
> > > zookeeper and with the topic and 1 consumer thread, results in only
> > > half the messages being consumed.  The topic was auto-created, with
> > > a replication factor of 2, but the producer was configured to
leader?
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB