Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Re: Kafka Consumer Threads Stalled


Copy link to this message
-
Re: Kafka Consumer Threads Stalled
The more accurate formula is the following since fetch size is per
partition.

<consumer threads> * <queuedchunks.max> * <fetch size> * #partitions

Thanks,

Jun
On Thu, Aug 15, 2013 at 9:40 PM, Drew Daugherty <
[EMAIL PROTECTED]> wrote:

> Thank you Jun. It turned out an OOME was thrown in one of the consumer
> fetcher threads.  Speaking of which, what is the best method for
> determining the consumer memory usage?  I had read that the formula below
> would suffice, but I am questioning it:
>
> <consumer threads> * <queuedchunks.max> * <fetch size>
>
> -drew
> ________________________________________
> From: Jun Rao [[EMAIL PROTECTED]]
> Sent: Wednesday, August 14, 2013 8:42 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Kafka Consumer Threads Stalled
>
> In that case, have you looked at
>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped%2Cwhy%3F
> ?
>
> Thanks,
>
> Jun
>
>
> On Wed, Aug 14, 2013 at 7:38 AM, Drew Daugherty <
> [EMAIL PROTECTED]> wrote:
>
> > The problem is not the fact that the timeout exceptions are being thrown.
> >  We have tried with and without the timeout setting and, in both cases,
> we
> > end up with threads that are stalled and not consuming data. Thus the
> > problem is consumers that are registered and not consuming and no
> > rebalancing is done  We suspected a problem with zookeeper but we have
> run
> > smoke and latency tests and got reasonable results.
> >
> > -drew
> >
> > Sent from Moxier Mail
> > (http://www.moxier.com)
> >
> >
> > ----- Original Message -----
> > From: Jun Rao <[EMAIL PROTECTED]>
> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> > Sent: 8/13/2013 10:17 PM
> > Subject: Re: Kafka Consumer Threads Stalled
> >
> >
> >
> > If you don't want to see ConsumerTimeoutException, just set
> > consumer.timeout.ms to -1. If you do need consumer.timeout.ms larger
> than
> > 0, make sure that on ConsumerTimeoutException,  your consumer thread
> loops
> > back and calls hasNext() on the iterator to resume the consumption.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Aug 13, 2013 at 4:57 PM, Drew Daugherty <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Hi,
> > >
> > > We are using zookeeper 3.3.6 with kafka 0.7.2. We have a topic with 8
> > > partitions on each of 3 brokers that we are consuming with a consumer
> > group
> > > with multiple threads.  We are using the following settings for our
> > > consumers:
> > > zk.connectiontimeout.ms=12000000
> > > fetch_size=52428800
> > > queuedchunks.max=6
> > > consumer.timeout.ms=5000
> > >
> > > Our brokers have the following configuration:
> > > socket.send.buffer=1048576
> > > socket.receive.buffer=1048576
> > > max.socket.request.bytes=104857600
> > > log.flush.interval=10000
> > > log.default.flush.interval.ms=1000
> > > log.default.flush.scheduler.interval.ms=1000
> > > log.retention.hours=4
> > > log.file.size=536870912
> > > enable.zookeeper=true
> > > zk.connectiontimeout.ms=6000
> > > zk.sessiontimeout.ms=6000
> > > max.message.size=52428800
> > >
> > > We are noticing that after the consumer runs for a short while, some
> > > threads stop consuming and start throwing the following timeout
> > exceptions:
> > > kafka.consumer.ConsumerTimeoutException
> > >         at
> > > kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:66)
> > >         at
> > > kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:32)
> > >         at
> > >
> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:59)
> > >         at
> > kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:51)
> > >
> > > When this happens, message consumption on the affected partitions
> doesn't
> > > recover but stalls and the consumer offset remains frozen.  The
> > exceptions
> > > also continue to be thrown in the logs as the thread logic logs the
> error
> > > then tries to create another iterator from the stream and consume from
> > it.
> > >  We also notice that consumption tends to freeze on 2/3 brokers but