This is exactly right. Partitions is configurable so set it to some
reasonable upper bound on the concurrency you desire. To give further
examples, let's say you have 5 threads:
- if you have 2 partitions only two threads will get data
- if you have 100 partitions each thread will get 20 partitions worth of
- if you have 7 partitions 3 threads will get 1 partition and 2 will get 2
It doesn't matter how many machines these threads are on, just the total
number of threads across all consumer instances.
The advantage of this approach is that each partition is always processed
by a single thread *in order*. If you have multiple threads consuming a
single partition you cannot guarantee order any more (though the messages
may have a particular order in the partition the order in which they are
processed by the consumers would be non-deterministic).
On Tue, Jul 2, 2013 at 12:34 PM, Josh Foure <[EMAIL PROTECTED]> wrote:
> Hi, am a also new to Kafka but let me explain my understanding which
> someone with more knowledge can confirm. There are actually 2 scenarios:
> 1. If all 5 of your consumers are in different "consumer groups" then
> this will behave like a JMS topic where all 5 of your consumers will each
> get a copy of all of the messages. You can add as many consumers as you
> want regardless of the fact that there is only 1 partition.
> 2. If all 5 of your consumers are in the same "consumer group" and there
> is only 1 partition, only 1 consumer will get a copy of the messages. You
> should increase your number of partitions so there are as many as there are
> consumers in this case so each will get 1/5 of the messages (assuming the
> messages are evenly distributed across the partitions).
> Is that what you were looking for? Can someone confirm that what I stated
> is accurate?
> From: Vinicius Carvalho <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Tuesday, July 2, 2013 2:55 PM
> Subject: Partitions and highlevel consumers
> Hi guys, we are starting with kafka in our project. We are using version
> 0.8. I come from a traditional JMS MoM architecture, and some things are
> new to me.
> One thing that I'm not getting is the mapping between partitions and number
> of threads. On a single consumer I can see the relationship but what
> happens when you have multiple consumers?
> So say, I have a topic and I have 5 instances of services that acts as
> consumers. On a traditional pub/sub I would have all 5 instances to consume
> the message right? Even if this topic only has one partition and now we
> have 5 threads (one per jvm instance) consuming it. Does this proceed?
> Just want to make sure if that is the expected behavior, from the docs it
> wasn't clear if its the number of threads per consumer, or the total number
> of threads around all clients to the topic.
> The intuitive mind is a sacred gift and the
> rational mind is a faithful servant. We have
> created a society that honors the servant and
> has forgotten the gift.