Kafka, mail # user - Re: Partitions and highlevel consumers - 2013-07-02, 19:52
Solr & Elasticsearch trainings in New York & San Francisco [more info][hide]
 Search Hadoop and all its subprojects:

Switch to Plain View
+
Vinicius Carvalho 2013-07-02, 18:55
+
Josh Foure 2013-07-02, 19:35
Copy link to this message
-
Re: Partitions and highlevel consumers
This is exactly right. Partitions is configurable so set it to some
reasonable upper bound on the concurrency you desire. To give further
examples, let's say you have 5 threads:
- if you have 2 partitions only two threads will get data
- if you have 100 partitions each thread will get 20 partitions worth of
data
- if you have 7 partitions 3 threads will get 1 partition and 2 will get 2

It doesn't matter how many machines these threads are on, just the total
number of threads across all consumer instances.

The advantage of this approach is that each partition is always processed
by a single thread *in order*. If you have multiple threads consuming a
single partition you cannot guarantee order any more (though the messages
may have a particular order in the partition the order in which they are
processed by the consumers would be non-deterministic).

-Jay
On Tue, Jul 2, 2013 at 12:34 PM, Josh Foure <[EMAIL PROTECTED]> wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB