


Relative cluster sizes and cluster size limits
Hi,
I have a question about scaling the broker count of a Kafka cluster. We have a scenario where we'll have two clusters replicating data into a third. We're wondering how we should size that third cluster so that it can handle the volume of messages from the two source clusters. Should we just make the number of brokers match? e.g. five brokers in the two source clusters, therefore 10 in the destination cluster. In general, what is the horizontal scaling model we should use? Also, is there an upper limit to the number of brokers you should put in a cluster, after which you get diminishing returns on throughput?
Thanks, Scott Arthur

Re: Relative cluster sizes and cluster size limits
Hi Scott,
What version of Kafka is this?
In general our throughput will scale linearly with the number of machines or more specifically the number of disks. Our bottleneck will really be with the number of partitions. With thousands of partitions leader election can get slower (seconds), and if you have consumers that consume all partitions the rebalancing in these consumers can get slow (minutes).
We hope to fix these issues but that is the current state up through 0.8.
Jay On Fri, Aug 2, 2013 at 2:27 PM, Scott Arthur <[EMAIL PROTECTED]> wrote:
> Hi, > > I have a question about scaling the broker count of a Kafka cluster. We > have a scenario where we'll have two clusters replicating data into a > third. We're wondering how we should size that third cluster so that it > can handle the volume of messages from the two source clusters. Should we > just make the number of brokers match? e.g. five brokers in the two source > clusters, therefore 10 in the destination cluster. In general, what is the > horizontal scaling model we should use? Also, is there an upper limit to > the number of brokers you should put in a cluster, after which you get > diminishing returns on throughput? > > Thanks, > Scott Arthur >

Re: Relative cluster sizes and cluster size limits
Hi,
This will be with Kafka 0.8. That is some good guidance, thank you. To summarize, we can scale the # of hosts/HDs as high as we want, but we should keep an eye on the total number of partitions being handled. We've currently configured a default of 4 partitions per topic, so we'll watch closely once we reach >250 topics. That should give us plenty to work with. Thanks!
Scott Arthur On Fri, Aug 2, 2013 at 10:49 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> Hi Scott, > > What version of Kafka is this? > > In general our throughput will scale linearly with the number of machines > or more specifically the number of disks. Our bottleneck will really be > with the number of partitions. With thousands of partitions leader election > can get slower (seconds), and if you have consumers that consume all > partitions the rebalancing in these consumers can get slow (minutes). > > We hope to fix these issues but that is the current state up through 0.8. > > Jay > > > On Fri, Aug 2, 2013 at 2:27 PM, Scott Arthur <[EMAIL PROTECTED]> > wrote: > > > Hi, > > > > I have a question about scaling the broker count of a Kafka cluster. We > > have a scenario where we'll have two clusters replicating data into a > > third. We're wondering how we should size that third cluster so that it > > can handle the volume of messages from the two source clusters. Should > we > > just make the number of brokers match? e.g. five brokers in the two > source > > clusters, therefore 10 in the destination cluster. In general, what is > the > > horizontal scaling model we should use? Also, is there an upper limit to > > the number of brokers you should put in a cluster, after which you get > > diminishing returns on throughput? > > > > Thanks, > > Scott Arthur > > >

