Kafka, mail # user - Re: application scenerio and suggested kafka setup - 2013-01-28, 06:27
Solr & Elasticsearch trainings in New York & San Francisco [more info][hide]
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
-
Re: application scenerio and suggested kafka setup
Hi Ahmed,

I can share with you my experience, I have built a system similar to yours.

1. If all your messages are the same, I think you should use the default
partitioner, so the messages will  spread evenly across all the
brokers/partition combinations, unless you have a better function to
spread them...  I think the default partitioner is picking a combination
of partition/broker randomly.
You are a bit wrong about the limitation, only one consumer within a
consumer group is allowed to consume a single (broker,partition,topic),
if you have several groups you should have in each a consumer to read
from each of these triplets.  The broker (kafka server) can handle many
connection threads. The reason for the group limitation is so only one
consumer within a group will handle a unique stream of events and you
will not need to worry about duplications and processing twice or more
your events. Also notice that if on consumer within a fails and other
consumer in the group exist, the other consumers will take care of the
triplets that were consumed by the failed consumer.
2. I think you are right here, this is at least what I have been doing.

3. Didn't find, but I have been using auto-scale feature of AWS to the
producer side, I guess it will be very little effort to do it on the
consumption side. You will have to create an auto-scale group and
configure the trigger to scale and scale down, and that should do the
trick... the rebalancing of the kafka consumer will be done
automatically whenver a new consumer comes up or down, notice that the
consumer are bounded by the number of #broker*#partitions.

Thanks, Huy

On 01/27/2013 11:06 PM, S Ahmed wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB