Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Is this a good overview of kafka?


Copy link to this message
-
Re: Is this a good overview of kafka?
Hello,

Your (non-question) statements seem mostly right to me. There is a bit of
confusion regarding your statement about partitions, however.

Partitions are primarily used to represent the smallest unit of
parallelism. If you need to split consumption among a pool of processes,
you need to have enough partitions for each of those consuming processes,
otherwise some of them will receive nothing.

Another property of partitions is that ordering is maintained within a
partition. If your use case requires it, you can implement a custom
partitioner so that a particular field within your produced messages
determines what partition the message is sent to. For example if you
partitioned using a User ID field within the messages, you would be
guaranteed that all messages pertaining to a certain user would end up in
the same partition, and that they would be correctly ordered. You should be
aware, however, that this guarantee is only maintained as long as there are
no consumer re-balance (which happens when adding or removing a consumer or
a broker).

Concerning your questions:

A consumer registers for topics, not for partitions, and it always
registers under the name of a consumer group. If there is only one consumer
registered for a given topic and consumer group, then that consumer will
receive messages from every available partition within that topic. If there
are multiple consumers registered under the same consumer group for a given
topic, then they will share that topic's available partitions among
themselves, which ensures that each partition is consumed by only one
consumer.

The high-level consumer uses Zookeeper to coordinate with the other
consumers and make sure that the partitions are appropriately assigned.

--
Felix
On Mon, Jan 14, 2013 at 2:15 PM, S Ahmed <[EMAIL PROTECTED]> wrote:

> Just want to verify that I understand the various components correctly,
> please chime in where appropriate :)
>
> producer = puts messages on broker(s)
> consumer = pulls messages of a broker
>
> In terms of how messages are organized on a cluster of brokers, producers
> put messages by providing a "topic".
>
> At the broker side of things, messages are stored by topic but can also be
> logicially seperated by a "partition", so that all messages for a
> particular topic are directed to a particular broker.
>
> On the consumer side, when you pull messages off, I know you can dedicated
> a consumer (or group of consumers) to a particular partition somehow. But
> what if you wanted to just randomly pull messages off?  Say I have 3
> brokers, and 5 consumers.  How does the consumer know which broker to
> connect too, and co-ordinate with the other consumers?
>
> Is there a flow diagram for the above scenerio? (or any other scenerio so I
> can understand how the communication takes place).
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB