Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - understanding OffsetOutOfRangeException's....


Copy link to this message
-
Re: understanding OffsetOutOfRangeException's....
Jun Rao 2014-01-09, 18:06
Does the consumption rate in the client (msg/sec) change significantly
after the refactoring?

Thanks,

Jun
On Wed, Jan 8, 2014 at 10:44 AM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> Yes, it's happening continuously, at the moment (although I'm expecting the
> consumer to catch up soon)....
>
> It seemed to start happening after I refactored the consumer app to use
> multiple consumer connectors in the same process (each one has a separate
> topic filter, so should be no overlap between them).  All using the same
> consumer group.
>
> Could it be a thread safety issue in the ZookeeperConsumerConnector (seems
> unlikely).
>
> Jason
>
>
> On Wed, Jan 8, 2014 at 1:04 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > Normally, if the consumer can't keep up, you should just see the
> > OffsetOutOfRangeException warning. The offset mismatch error should never
> > happen. It could be that OffsetOutOfRangeException exposed a bug. Do you
> > think you can reproduce this easily?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Jan 7, 2014 at 9:29 PM, Jason Rosenberg <[EMAIL PROTECTED]>
> wrote:
> >
> > > Jun,
> > >
> > > I'm not sure I understand your question, wrt produced data?
> > >
> > > But yes, in general, I believe the consumer is not keeping up with the
> > > broker's deleting the data.  So it's trying to fetch the next batch of
> > > data, but it's last offset is no longer there, etc.  So that's the
> reason
> > > for the WARN message, in the fetcher thread.
> > >
> > > I'm just not sure I understand then why we don't always see the
> > > ConsumerIterator error also, because won't there always be missing data
> > > detected there?  Why sometimes and not always?  What's the difference?
> > >
> > > Jason
> > >
> > >
> > > On Wed, Jan 8, 2014 at 12:07 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > >
> > > > The WARN and ERROR may not be completely correlated. Could it be that
> > the
> > > > consumer is slow and couldn't keep up with the produced data?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > > So, sometimes I just get the WARN from the ConsumerFetcherThread
> (as
> > > > > previously noted, above), e.g.:
> > > > >
> > > > > 2014-01-08 02:31:47,394  WARN
> > [ConsumerFetcherThread-myconsumerapp-11]
> > > > > consumer.ConsumerFetcherThread -
> > > > > [ConsumerFetcherThread-myconsumerapp-11], Current offset
> 16163904970
> > > > > for partition [mypartition,0] out of range; reset offset to
> > > > > 16175326044
> > > > >
> > > > > More recently, I see these in the following log line (not sure why
> I
> > > > > didn't see it previously), coming from the ConsumerIterator:
> > > > >
> > > > > 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
> > > > > consumer.ConsumerIterator - consumed offset: 16163904970 doesn't
> > match
> > > > > fetch offset: 16175326044 for mytopic:0: fetched offset =
> > 16175330598:
> > > > > consumed offset = 16163904970;
> > > > >  Consumer may lose data
> > > > >
> > > > > Why would I not see this second ERROR everytime there's a
> > > > > corresponding WARN on the FetcherThread for an offset reset?
> > > > >
> > > > > Should I only be concerned about possible lost data if I see the
> > > > > second ERROR log line?
> > > > >
> > > > > Jason
> > > > >
> > > > > On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg <[EMAIL PROTECTED]
> >
> > > > wrote:
> > > > > > But I assume this would not be normally you'd want to log (every
> > > > > > incoming producer request?).  Maybe just for debugging?  Or is it
> > > only
> > > > > > for consumer fetch requests?
> > > > > >
> > > > > > On Tue, Dec 24, 2013 at 12:50 PM, Guozhang Wang <
> > [EMAIL PROTECTED]>
> > > > > wrote:
> > > > > >> TRACE is lower than INFO so INFO level request logging would
> also
> > be
> > > > > >> recorded.
> > > > > >>
> > > > > >> You can check for "Completed XXX request" in the log files to
> > check
> > > > the
> > > > > >> request info with the correlation id.