I am wondering what others are doing in terms of cluster separation. (if at all) For example let’s say I need 24 nodes to support a given workload. What are the tradeoffs between a single 24 node cluster vs 2 x 12 node clusters for example. The application I support can support separation of data fairly easily as the data is all processed in the same way but can be sharded isolated based on customers. I understand the standard tradeoffs, for example putting all your eggs in one basket but curious as if there are any details specific to Kafka in terms of cluster scale out.
Somewhat related is the use of RAID vs JBOD, I have reviewed the documents on the Kafka site and understand the tradeoff between space as well as sequential IO vs random and the fact a RAID rebuild might kill the system. I am specifically asking the question as it relates to larger cluster and the impact on the number of partitions a topic might need.
Take an example of a 24 node cluster with 12 drives each the cluster would have 288 drives. To ensure a topic is distributed across all drives a topic would require 288 partitions. I am planning to test some of this but wanted to know if there was a rule of thumb. The following link https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic? Talks about supporting up to 10K partitions but its not clear if this is for a cluster as a whole vs topic based Those of you running larger clusters what are you doing? Bert
thanks for your detailed response. I have added comments inline. On Wed, Apr 16, 2014 at 7:41 PM, Bello, Bob <[EMAIL PROTECTED]> wrote:
BERT> We have several uses cases we are looking at kafka for. Today we are just using the file system to buffer data between our systems. We are looking at uses cases that have varying message sizes of 200, 300, 1000, 2200 bytes
BERT> The use case we are looking at currently has hourly peaks of about 450K messages per second. For sizing we want to make sure we can support 900K . Our larger feed in terms of size peaks at 450MBsec so we want to make sure the cluster we build can support 900MBsec
BERT> our producer count will be low ...maybe 8-16 hosts.
BERT> This will be much higher but not sure yet. We are also looking at replacing some legacy technology with storm so this is a bit up in the air right now.
BERT> We will use whatever performs best ;) My gut is that we will be using snappy
BERT> Are you implying that the number of topics has direct correlation to the fail-over time? I think I might test this by creating one topic loading 500 million rows and test failover adn compare to 500 topics with 1 million rows each. Not sure if data in the Q impacts the failover so figured I would test that also.
BERT> We are lucky that order is not critical for our large feeds.
BERT> Our default config config has a 256GB of memory also. One thing I do want to test is impact on cluster of reading data not in memory. Have you done any testing like this?
BERT> Thanks for the datapoint. We were also planning to go with replication factor of 3
BERT> We are big fans of vms...however kafka will be on physical
BERT> 10gb....so cheap now. I did cost analysis and found that a single 10gb port costs about the same as 2 x 1gig. Five times the bandwidth and less latency makes it no brainer. If your kafka hosts have multiple nics make sure they are using the right port. This one bit me for a little. (hostname config in the broker config)
BERT> We have not determined what to use just yet for monitoring. What are you guys using? BERT> Can you share more about your config? Are you using RAID10 or RAID5? What size and speed of drives? Have you needed to do a RAID rebuild and if so did it negatively impact the cluster. The standard server I was given has 12 x 4TB 7.2K drives. I will either run in JBOD or as RAID10. Parity based RAID with 4TB drives makes me nervous. I am not worried about performance when things are working as designed...we need to plan for edge cases when consumer is reading old data or the system needs to play catch up on a big backlog.
BERT> I need a crystal ball ;)
BERT> yes DEV/QA/PROD completely separate
BERT> Need to spend some time on zookeeper. I have not looked at zookeeper performance to see if its negatively impacting the performance tests I am doing. We haven't spent any time looking at zookeeper. Did you find that the SSD helped improve kafka performance?
BERT> Thanks for the example. Good to see others are using larger partition counts.
Just curious, would losing one disk in a JBOD setup really mean you’d have to re-replicate 20TB of data? If a single drive dies, wouldn’t you only lose the partitions that happen to be on that drive? On Apr 17, 2014, at 8:00 PM, Bello, Bob <[EMAIL PROTECTED]> wrote:
If you lose one drive in a JBOD setup you will just re-replicate the data on that disk. It is similar to what you would do during RAID repair except that instead of having the data coming 100% from the mirror drives the load will be spread over the rest of the cluster.
The real downside of RAID is that if you lose one drive the machine with the dead drive will be down until the drive is replaced/fixed. RAID allows you to continue using that machine even with the dead drive, although we have seen that there can be issues with this in practice.
-Jay On Fri, Apr 18, 2014 at 7:35 AM, Andrew Otto <[EMAIL PROTECTED]> wrote: