Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Kafka for cluster fan out messaging


Copy link to this message
-
Re: Kafka for cluster fan out messaging
If you have 1000 partitions and 500 consumers, each consumer should be
consuming 2 partitions. You can verify this using ConsumerOffsetChecker.
Which version of Kafka are you using? If it's 0.8, you may want to take a
look at
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whydataisnotevenlydistributedamongpartitionswhenpartitioningkeyisnotspecified
?

Thanks,

Jun
On Sun, Dec 29, 2013 at 12:18 PM, Jay Beavers
<[EMAIL PROTECTED]>wrote:

> I've been trying to use Kafka to feed data into a computing cluster (e.g.
> 500 servers).  The basic design is one 'job submitter' server is a Producer
> into a Topic with 1000 partitions.  I then have 500 servers each running an
> instance of a multithreaded High Level Consumer all with a shared
> group.idthat asynchronously process incoming messages against a CPU
> intensive
> workload.  My expectation was that the Kafka would use server side logic to
> map the topic partitions into the different consumer instances in the
> shared group.  My goal is to be able to join and leave consumer instances
> over the lifetime of the processing and have Kafka automatically rebalance
> the partitions to the set of live Consumer instances.
>
> This hasn't been working well for me -- in practice I've seen one or two of
> my cluster servers pick up messages and the others sit idle.  I suspect
> that each High Level Consumer is picking up partition 0 and ZooKeeper is
> getting confused about which instance/socket to map the messages into.
>  After reading through the docs a few more times, I think the partition ->
> group mapping logic is client side rather than server side -- if this is
> the case I think my scenario is fundamentally broken unless I implement an
> independent service for partition -> client mapping.  I've looked through
> the Simple Consumer example and it looks like the partition mapping logic
> is handled client side there so it seems to lead me back down the path of
> writing my own partitioning service.
>
> Can you confirm my understanding that partition -> consumer mapping is
> client side logic?  Is there an established pattern I should be following
> to use Kafka in a 1 Producer -> Many Consumers Instances in a Shared Group
> scenario?
>
> Thanks in advance for your advice,
>
>  - jcb
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB