Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Kafka crashed after multiple topics were added


Copy link to this message
-
Re: Kafka crashed after multiple topics were added
Joel Koshy 2013-08-14, 16:27
We use 30k as the limit. It is largely driven by the number of partitions
(including replicas), retention period and number of
simultaneous producers/consumers.

In your case it seems you have 150 topics, 36 partitions, 3x replication -
with that configuration you will definitely need to up your file handle
limit.

Thanks,

Joel

On Wednesday, August 14, 2013, Vadim Keylis wrote:

> Good morning Jun. Correction in terms of open file handler limit. I was
> wrong. I re-ran the command  ulimit -Hn and it shows 10240. Which brings to
> the next question. How appropriately calculate open files handler required
> by Kafka? What is your guys settings for this field?
>
> Thanks,
> Vadim
>
>
>
> On Wed, Aug 14, 2013 at 8:19 AM, Vadim Keylis <[EMAIL PROTECTED]<javascript:;>>
> wrote:
>
> > Good morning Jun. We are using Kafka 0.8 that I built from trunk in June
> > or early July. I forgot to mention that running ulimit on the hosts shows
> > open file handler set to unlimited. What are the ways to recover from
> last
> > error and restart Kafka ? How can I delete topic with Kafka service on
> all
> > host down? How many topics can Kafka support to prevent to many open file
> > exception? What did you set open file handler limit in your cluster?
> >
> > Thanks so much,
> > Vadim
> >
> > Sent from my iPhone
> >
> > On Aug 14, 2013, at 7:38 AM, Jun Rao <[EMAIL PROTECTED] <javascript:;>>
> wrote:
> >
> > > The first error is caused by too many open file handlers. Kafka keeps
> > each
> > > of the segment files open on the broker. So, the more topics/partitions
> > you
> > > have, the more file handlers you need. You probably need to increase
> the
> > > open file handler limit and also monitor the # of open file handlers so
> > > that you can get an alert when it gets close to the limit.
> > >
> > > Not sure why you get the second error on restart. Are you using the 0.8
> > > beta1 release?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Tue, Aug 13, 2013 at 11:04 PM, Vadim Keylis <[EMAIL PROTECTED]<javascript:;>
> > >wrote:
> > >
> > >> We have 3 node kafka cluster. I initially created 4 topics.
> > >> I wrote small shell script to create 150 topics.
> > >>
> > >> TOPICS=$(< $1)
> > >> for topic in $TOPICS
> > >> do
> > >>   echo "/usr/local/kafka/bin/kafka-create-topic.sh --replica 3 --topic
> > >> $topic --zookeeper $2:2181/kafka --partition 36"
> > >>   /usr/local/kafka/bin/kafka-create-topic.sh --replica 3 --topic
> $topic
> > >> --zookeeper $2:2181/kafka --partition 36
> > >> done
> > >>
> > >> 10 minutes later I see messages like this
> > >> [2013-08-13 11:43:58,944] INFO [ReplicaFetcherManager on broker 7]
> > Removing
> > >> fetcher for partition [m3_registration,0]
> > >> (kafka.server.ReplicaFetcherManager) followed by
> > >> [2013-08-13 11:44:00,067] WARN [ReplicaFetcherThread-0-8], error for
> > >> partition [m3_registration,22] to broker 8
> > >> (kafka.server.ReplicaFetcherThread)
> > >> kafka.common.NotLeaderForPartitionException
> > >>
> > >> Then a few minutes later followed by the following messages that
> > >> overwhelmed logging system.
> > >> [2013-08-13 11:46:35,916] ERROR error in loggedRunnable
> > >> (kafka.utils.Utils$)
> > >> java.io.FileNotFoundException:
> > >> /home/kafka/data7/replication-offset-checkpoint.tmp (Too many open
> > files)
> > >>        at java.io.FileOutputStream.open(Native Method)
> > >>        at java.io.FileOutputStream.<init>(FileOutputStream.java:194)
> > >>
> > >> I restarted the service after discovering the problem. After a few
> > minutes
> > >> attempting to recover kafka service crashed with the following error.
> > >>
> > >> [2013-08-13 17:20:08,953] INFO [Log Manager on Broker 7] Loading log
> > >> 'm3_registration-29' (kafka.log.LogManager)
> > >> [2013-08-13 17:20:08,992] FATAL Fatal error during KafkaServerStable
> > >> startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
> > >> java.lang.IllegalStateException: Found log file with no corresponding
> > index