We have 3 node kafka cluster. I initially created 4 topics. I wrote small shell script to create 150 topics.
TOPICS=$(< $1) for topic in $TOPICS do echo "/usr/local/kafka/bin/kafka-create-topic.sh --replica 3 --topic $topic --zookeeper $2:2181/kafka --partition 36" /usr/local/kafka/bin/kafka-create-topic.sh --replica 3 --topic $topic done
10 minutes later I see messages like this [2013-08-13 11:43:58,944] INFO [ReplicaFetcherManager on broker 7] Removing fetcher for partition [m3_registration,0] (kafka.server.ReplicaFetcherManager) followed by [2013-08-13 11:44:00,067] WARN [ReplicaFetcherThread-0-8], error for partition [m3_registration,22] to broker 8 (kafka.server.ReplicaFetcherThread) kafka.common.NotLeaderForPartitionException
Then a few minutes later followed by the following messages that overwhelmed logging system. [2013-08-13 11:46:35,916] ERROR error in loggedRunnable (kafka.utils.Utils$) java.io.FileNotFoundException: /home/kafka/data7/replication-offset-checkpoint.tmp (Too many open files) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:194)
I restarted the service after discovering the problem. After a few minutes attempting to recover kafka service crashed with the following error.
[2013-08-13 17:20:08,953] INFO [Log Manager on Broker 7] Loading log 'm3_registration-29' (kafka.log.LogManager) [2013-08-13 17:20:08,992] FATAL Fatal error during KafkaServerStable startup. Prepare to shutdown (kafka.server.KafkaServerStartable) java.lang.IllegalStateException: Found log file with no corresponding index file.
No activity on the cluster after topics were added. What could have cause the crash and trigger too many open files exception? What the best way to recover in order to restart kafka service(Not sure if delete topic command will work in this particular case as all 3 services would not start)?How to prevent in the future?
The first error is caused by too many open file handlers. Kafka keeps each of the segment files open on the broker. So, the more topics/partitions you have, the more file handlers you need. You probably need to increase the open file handler limit and also monitor the # of open file handlers so that you can get an alert when it gets close to the limit.
Not sure why you get the second error on restart. Are you using the 0.8 beta1 release?
Jun On Tue, Aug 13, 2013 at 11:04 PM, Vadim Keylis <[EMAIL PROTECTED]>wrote:
Good morning Jun. We are using Kafka 0.8 that I built from trunk in June or early July. I forgot to mention that running ulimit on the hosts shows open file handler set to unlimited. What are the ways to recover from last error and restart Kafka ? How can I delete topic with Kafka service on all host down? How many topics can Kafka support to prevent to many open file exception? What did you set open file handler limit in your cluster?
Thanks so much, Vadim
Sent from my iPhone
On Aug 14, 2013, at 7:38 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
Good morning Jun. Correction in terms of open file handler limit. I was wrong. I re-ran the command ulimit -Hn and it shows 10240. Which brings to the next question. How appropriately calculate open files handler required by Kafka? What is your guys settings for this field?
On Wed, Aug 14, 2013 at 8:19 AM, Vadim Keylis <[EMAIL PROTECTED]> wrote:
Joel thanks so much. Do you guys have hard set limit on a maximum topics Kafka can support. Are there any other OS level settings I should be concerned that may cause kafka to crash. I am still trying to understand how to recover from failure and start service.
The following error causes kafka not to restart [2013-08-13 17:20:08,992] FATAL Fatal error during KafkaServerStable startup. Prepare to shutdown (kafka.server.KafkaServerStartable) java.lang.IllegalStateException: Found log file with no corresponding index file. On Wed, Aug 14, 2013 at 9:27 AM, Joel Koshy <[EMAIL PROTECTED]> wrote:
These would be highly specific to capacity planning for your use cases, but you would typically need to take into account the volume of each topic, desired consumer parallelism, available hardware and so on. We have an operations wiki (https://cwiki.apache.org/confluence/display/KAFKA/Operations), but definitely needs some updates for 0.8. Not sure how you got into that state. It could be that while a log segment was being created you ran out of file handles - i.e,. the log file was created but not the index file although I would have to look at the code more closely to confirm. In any event, I think in this case you would just need to delete these log files from disk. On Wed, Aug 14, 2013 at 10:31 AM, Vadim Keylis <[EMAIL PROTECTED]> wrote:
Good Morning Joel. Just to understand clearly how to predict number of open files kept by kafka.
That is calculated by multiplying number of topics * number of partitions * number of replicas. In our case it would be 150 * 36 * 3. Am I correct? How number of producers and consumers will influence/impact that calculation? Is it advisable to have less partition? Does 36 partition sounds reasonable?
Thanks so much in advance On Wed, Aug 14, 2013 at 9:27 AM, Joel Koshy <[EMAIL PROTECTED]> wrote:
The tradeoff is there: Pro: more partitions means more consumer parallelism. The total threads/processes across all consumer machines can't exceed the consumer count. Con: more partitions mean more file descriptors and hence smaller writes to each file (so more random io).
Our setting is fairly random. The ideal number would be the smallest number that satisfies your forceable need for consumer parallelism.
-Jay On Thu, Aug 15, 2013 at 3:41 PM, Vadim Keylis <[EMAIL PROTECTED]> wrote:
Jun On Thu, Aug 15, 2013 at 4:38 PM, Vadim Keylis <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext