Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - understanding OffsetOutOfRangeException's....


Copy link to this message
-
Re: understanding OffsetOutOfRangeException's....
Jun Rao 2014-01-10, 16:06
Could you increase parallelism on the consumers?

Thanks,

Jun
On Thu, Jan 9, 2014 at 1:22 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> The consumption rate is a little better after the refactoring.  The main
> issue though, was that we had a mismatch between large and small topics.  A
> large topic can lag, and adversely affect consumption of other topics, so
> this is an attempt to isolate topic filtering, and better balance the
> consumers for the different topics.
>
> So, it's definitely working on that score.
>
> The topic that was lagging (and getting OffsetOutOfRangeExceptions) was
> doing that before and after the refactor (and after we started also seeing
> the ERROR logging).  But consumption of all other topics is working better
> now (almost no lag at all).
>
> I'm also setting the client.id for each consumer in the process, so that I
> can see the individual metrics per consumer.
>
> Jason
>
>
> On Thu, Jan 9, 2014 at 1:00 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > Does the consumption rate in the client (msg/sec) change significantly
> > after the refactoring?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Jan 8, 2014 at 10:44 AM, Jason Rosenberg <[EMAIL PROTECTED]>
> wrote:
> >
> > > Yes, it's happening continuously, at the moment (although I'm expecting
> > the
> > > consumer to catch up soon)....
> > >
> > > It seemed to start happening after I refactored the consumer app to use
> > > multiple consumer connectors in the same process (each one has a
> separate
> > > topic filter, so should be no overlap between them).  All using the
> same
> > > consumer group.
> > >
> > > Could it be a thread safety issue in the ZookeeperConsumerConnector
> > (seems
> > > unlikely).
> > >
> > > Jason
> > >
> > >
> > > On Wed, Jan 8, 2014 at 1:04 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > >
> > > > Normally, if the consumer can't keep up, you should just see the
> > > > OffsetOutOfRangeException warning. The offset mismatch error should
> > never
> > > > happen. It could be that OffsetOutOfRangeException exposed a bug. Do
> > you
> > > > think you can reproduce this easily?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Tue, Jan 7, 2014 at 9:29 PM, Jason Rosenberg <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > > Jun,
> > > > >
> > > > > I'm not sure I understand your question, wrt produced data?
> > > > >
> > > > > But yes, in general, I believe the consumer is not keeping up with
> > the
> > > > > broker's deleting the data.  So it's trying to fetch the next batch
> > of
> > > > > data, but it's last offset is no longer there, etc.  So that's the
> > > reason
> > > > > for the WARN message, in the fetcher thread.
> > > > >
> > > > > I'm just not sure I understand then why we don't always see the
> > > > > ConsumerIterator error also, because won't there always be missing
> > data
> > > > > detected there?  Why sometimes and not always?  What's the
> > difference?
> > > > >
> > > > > Jason
> > > > >
> > > > >
> > > > > On Wed, Jan 8, 2014 at 12:07 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > > The WARN and ERROR may not be completely correlated. Could it be
> > that
> > > > the
> > > > > > consumer is slow and couldn't keep up with the produced data?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg <
> [EMAIL PROTECTED]>
> > > > > wrote:
> > > > > >
> > > > > > > So, sometimes I just get the WARN from the
> ConsumerFetcherThread
> > > (as
> > > > > > > previously noted, above), e.g.:
> > > > > > >
> > > > > > > 2014-01-08 02:31:47,394  WARN
> > > > [ConsumerFetcherThread-myconsumerapp-11]
> > > > > > > consumer.ConsumerFetcherThread -
> > > > > > > [ConsumerFetcherThread-myconsumerapp-11], Current offset
> > > 16163904970
> > > > > > > for partition [mypartition,0] out of range; reset offset to
> > > > > > > 16175326044
> > > > > > >
> > > > > > > More recently, I see these in the following log line (not sure