Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> application scenerio and suggested kafka setup


+
S Ahmed 2013-01-27, 21:07
+
Jun Rao 2013-01-27, 23:54
Copy link to this message
-
Re: application scenerio and suggested kafka setup
Hi Ahmed,

I can share with you my experience, I have built a system similar to yours.

1. If all your messages are the same, I think you should use the default
partitioner, so the messages will  spread evenly across all the
brokers/partition combinations, unless you have a better function to
spread them...  I think the default partitioner is picking a combination
of partition/broker randomly.
You are a bit wrong about the limitation, only one consumer within a
consumer group is allowed to consume a single (broker,partition,topic),
if you have several groups you should have in each a consumer to read
from each of these triplets.  The broker (kafka server) can handle many
connection threads. The reason for the group limitation is so only one
consumer within a group will handle a unique stream of events and you
will not need to worry about duplications and processing twice or more
your events. Also notice that if on consumer within a fails and other
consumer in the group exist, the other consumers will take care of the
triplets that were consumed by the failed consumer.
2. I think you are right here, this is at least what I have been doing.

3. Didn't find, but I have been using auto-scale feature of AWS to the
producer side, I guess it will be very little effort to do it on the
consumption side. You will have to create an auto-scale group and
configure the trigger to scale and scale down, and that should do the
trick... the rebalancing of the kafka consumer will be done
automatically whenver a new consumer comes up or down, notice that the
consumer are bounded by the number of #broker*#partitions.

Thanks, Huy

On 01/27/2013 11:06 PM, S Ahmed wrote:
> Say I create web application/service where customers signup, and they place
> some javascript on their website which will then send over http a message
> to my servers every time someone clicks on a link on their website.
>
> Each customer will send to their own custom subdomain like:
>
> customer1.example.com/api/put?linkId=1&......
>
> Say I have 100,000 customers.
>
> 1. If all events are of the same type, what are the potential means I could
> partition my topics?  Or does it not make sense to?  I'm confused as to
> what I am reading, is a given kafka topic + paritition combination ONLY
> allowed to be consumed by a single consumer group?  If so, why is that?
>   the kafka server can only handle a single thread connecting to it??
>
> 2. I will have a java servlet that will contain my producer (each front end
> server will have the same servlet that will contain a producer).  I want to
> batch every x messages.  From what I understand, my producer is something I
> will create using a singleton correct?
>
> 3. I want my consumers to by dynamic in size, so during peak hours I want
> to fire up more nodes to  keep up with traffic, is there a production
> worthy consumer daemon that I can use (or learn from) that is open sourced
> somewhere?
>
> Much appreciated!
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB