Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Kafka for cluster fan out messaging

Copy link to this message
Re: Kafka for cluster fan out messaging
Joe Stein 2013-12-30, 05:04
consumer mapping is client side logic to the high level consumer, yes.

partition -> client mapping is automatically handled by the high level
consumer.  this is organized by groupId

when groupId is the same the consumers operate together and consumers
within the group do not see the same information.  when groupId is
different then they will see the same information and manage their offsets

the kafka high level consumer will rebalance consumers for partitions
available.  if every consumer has at most one partition and there are more
consumers in the group still then those you should treat like consumer side
fail over.   so you can have twice the number of consumers you have
partitions and lose 50% of your infrastructure and not lose any performance
through the pipeline.

 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
On Sun, Dec 29, 2013 at 3:18 PM, Jay Beavers <[EMAIL PROTECTED]>wrote:

> I've been trying to use Kafka to feed data into a computing cluster (e.g.
> 500 servers).  The basic design is one 'job submitter' server is a Producer
> into a Topic with 1000 partitions.  I then have 500 servers each running an
> instance of a multithreaded High Level Consumer all with a shared
> group.idthat asynchronously process incoming messages against a CPU
> intensive
> workload.  My expectation was that the Kafka would use server side logic to
> map the topic partitions into the different consumer instances in the
> shared group.  My goal is to be able to join and leave consumer instances
> over the lifetime of the processing and have Kafka automatically rebalance
> the partitions to the set of live Consumer instances.
> This hasn't been working well for me -- in practice I've seen one or two of
> my cluster servers pick up messages and the others sit idle.  I suspect
> that each High Level Consumer is picking up partition 0 and ZooKeeper is
> getting confused about which instance/socket to map the messages into.
>  After reading through the docs a few more times, I think the partition ->
> group mapping logic is client side rather than server side -- if this is
> the case I think my scenario is fundamentally broken unless I implement an
> independent service for partition -> client mapping.  I've looked through
> the Simple Consumer example and it looks like the partition mapping logic
> is handled client side there so it seems to lead me back down the path of
> writing my own partitioning service.
> Can you confirm my understanding that partition -> consumer mapping is
> client side logic?  Is there an established pattern I should be following
> to use Kafka in a 1 Producer -> Many Consumers Instances in a Shared Group
> scenario?
> Thanks in advance for your advice,
>  - jcb