This is exactly right. Partitions is configurable so set it to some
reasonable upper bound on the concurrency you desire. To give further
examples, let's say you have 5 threads:
- if you have 2 partitions only two threads will get data
- if you have 100 partitions each thread will get 20 partitions worth of
- if you have 7 partitions 3 threads will get 1 partition and 2 will get 2
It doesn't matter how many machines these threads are on, just the total
number of threads across all consumer instances.
The advantage of this approach is that each partition is always processed
by a single thread *in order*. If you have multiple threads consuming a
single partition you cannot guarantee order any more (though the messages
may have a particular order in the partition the order in which they are
processed by the consumers would be non-deterministic).
On Tue, Jul 2, 2013 at 12:34 PM, Josh Foure <[EMAIL PROTECTED]> wrote: