Home | About | Sematext search-lucene.com search-hadoop.com search-devops.com
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Is this a good overview of kafka?


Copy link to this message
-
Re: Is this a good overview of kafka?
Hello,

Your (non-question) statements seem mostly right to me. There is a bit of
confusion regarding your statement about partitions, however.

Partitions are primarily used to represent the smallest unit of
parallelism. If you need to split consumption among a pool of processes,
you need to have enough partitions for each of those consuming processes,
otherwise some of them will receive nothing.

Another property of partitions is that ordering is maintained within a
partition. If your use case requires it, you can implement a custom
partitioner so that a particular field within your produced messages
determines what partition the message is sent to. For example if you
partitioned using a User ID field within the messages, you would be
guaranteed that all messages pertaining to a certain user would end up in
the same partition, and that they would be correctly ordered. You should be
aware, however, that this guarantee is only maintained as long as there are
no consumer re-balance (which happens when adding or removing a consumer or
a broker).

Concerning your questions:

A consumer registers for topics, not for partitions, and it always
registers under the name of a consumer group. If there is only one consumer
registered for a given topic and consumer group, then that consumer will
receive messages from every available partition within that topic. If there
are multiple consumers registered under the same consumer group for a given
topic, then they will share that topic's available partitions among
themselves, which ensures that each partition is consumed by only one
consumer.

The high-level consumer uses Zookeeper to coordinate with the other
consumers and make sure that the partitions are appropriately assigned.

Felix
On Mon, Jan 14, 2013 at 2:15 PM, S Ahmed <[EMAIL PROTECTED]> wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB