|
|
-
application scenerio and suggested kafka setup
S Ahmed 2013-01-27, 21:07
Say I create web application/service where customers signup, and they place some javascript on their website which will then send over http a message to my servers every time someone clicks on a link on their website.
Each customer will send to their own custom subdomain like:
customer1.example.com/api/put?linkId=1&......
Say I have 100,000 customers.
1. If all events are of the same type, what are the potential means I could partition my topics? Or does it not make sense to? I'm confused as to what I am reading, is a given kafka topic + paritition combination ONLY allowed to be consumed by a single consumer group? If so, why is that? the kafka server can only handle a single thread connecting to it??
2. I will have a java servlet that will contain my producer (each front end server will have the same servlet that will contain a producer). I want to batch every x messages. From what I understand, my producer is something I will create using a singleton correct?
3. I want my consumers to by dynamic in size, so during peak hours I want to fire up more nodes to keep up with traffic, is there a production worthy consumer daemon that I can use (or learn from) that is open sourced somewhere?
Much appreciated!
+
S Ahmed 2013-01-27, 21:07
-
Re: application scenerio and suggested kafka setup
Jun Rao 2013-01-27, 23:54
Partition is useful for increasing the degree of parallelism of consumers and to a certain degree, producers too. You can have multiple consumer groups consuming the same topic/partition.
Thanks,
Jun
On Sun, Jan 27, 2013 at 1:06 PM, S Ahmed <[EMAIL PROTECTED]> wrote:
> Say I create web application/service where customers signup, and they place > some javascript on their website which will then send over http a message > to my servers every time someone clicks on a link on their website. > > Each customer will send to their own custom subdomain like: > > customer1.example.com/api/put?linkId=1&...... > > Say I have 100,000 customers. > > 1. If all events are of the same type, what are the potential means I could > partition my topics? Or does it not make sense to? I'm confused as to > what I am reading, is a given kafka topic + paritition combination ONLY > allowed to be consumed by a single consumer group? If so, why is that? > the kafka server can only handle a single thread connecting to it?? > > 2. I will have a java servlet that will contain my producer (each front end > server will have the same servlet that will contain a producer). I want to > batch every x messages. From what I understand, my producer is something I > will create using a singleton correct? > > 3. I want my consumers to by dynamic in size, so during peak hours I want > to fire up more nodes to keep up with traffic, is there a production > worthy consumer daemon that I can use (or learn from) that is open sourced > somewhere? > > Much appreciated! >
+
Jun Rao 2013-01-27, 23:54
-
Re: application scenerio and suggested kafka setup
Guy Doulberg 2013-01-28, 06:27
Hi Ahmed,
I can share with you my experience, I have built a system similar to yours.
1. If all your messages are the same, I think you should use the default partitioner, so the messages will spread evenly across all the brokers/partition combinations, unless you have a better function to spread them... I think the default partitioner is picking a combination of partition/broker randomly. You are a bit wrong about the limitation, only one consumer within a consumer group is allowed to consume a single (broker,partition,topic), if you have several groups you should have in each a consumer to read from each of these triplets. The broker (kafka server) can handle many connection threads. The reason for the group limitation is so only one consumer within a group will handle a unique stream of events and you will not need to worry about duplications and processing twice or more your events. Also notice that if on consumer within a fails and other consumer in the group exist, the other consumers will take care of the triplets that were consumed by the failed consumer. 2. I think you are right here, this is at least what I have been doing.
3. Didn't find, but I have been using auto-scale feature of AWS to the producer side, I guess it will be very little effort to do it on the consumption side. You will have to create an auto-scale group and configure the trigger to scale and scale down, and that should do the trick... the rebalancing of the kafka consumer will be done automatically whenver a new consumer comes up or down, notice that the consumer are bounded by the number of #broker*#partitions.
Thanks, Huy
On 01/27/2013 11:06 PM, S Ahmed wrote: > Say I create web application/service where customers signup, and they place > some javascript on their website which will then send over http a message > to my servers every time someone clicks on a link on their website. > > Each customer will send to their own custom subdomain like: > > customer1.example.com/api/put?linkId=1&...... > > Say I have 100,000 customers. > > 1. If all events are of the same type, what are the potential means I could > partition my topics? Or does it not make sense to? I'm confused as to > what I am reading, is a given kafka topic + paritition combination ONLY > allowed to be consumed by a single consumer group? If so, why is that? > the kafka server can only handle a single thread connecting to it?? > > 2. I will have a java servlet that will contain my producer (each front end > server will have the same servlet that will contain a producer). I want to > batch every x messages. From what I understand, my producer is something I > will create using a singleton correct? > > 3. I want my consumers to by dynamic in size, so during peak hours I want > to fire up more nodes to keep up with traffic, is there a production > worthy consumer daemon that I can use (or learn from) that is open sourced > somewhere? > > Much appreciated!
+
Guy Doulberg 2013-01-28, 06:27
|
|