Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Consuming from X days ago & issues consuming from the beginning of time


Copy link to this message
-
Re: Consuming from X days ago & issues consuming from the beginning of time
Matthew Rathbone 2012-09-20, 17:33
Awesome answers, that's perfect, thanks guys.

On Thu, Sep 20, 2012 at 12:26 PM, Joel Koshy <[EMAIL PROTECTED]> wrote:

> Try using the getOffsetsBefore API in SimpleConsumer. (There is also a
> command-line tool - GetOffsetShell.)
>
> You can specify a topic, partition and time and it will give valid offsets
> prior to that time. It will be approximate though as it looks at the
> modtime of the log segments in each partition. If you are using
> SimpleConsumer directly you can just consume from those offsets.
>
> Joel
>
> On Thu, Sep 20, 2012 at 9:20 AM, Matthew Rathbone <[EMAIL PROTECTED]
> >wrote:
>
> > Hey guys,
> >
> > I've come across this behavior with the hadoop-consumer, but it certainly
> > applies to any consumer.
> >
> > We've had our brokers up and running for about 9 days, with a 7-day
> > retention policy. (3 brokers with 3 partitions each)
> > I've just deployed a new hadoop consumer and wanted to read from the
> > beginning of time (7-days ago).
> >
> > Here's the behavior I'm seeing:
> > - I tell the consumer to start from 0
> > - It queries the partition, finds the minimum available is 2000000, so it
> > starts there
> > - It starts consuming from 2000000+
> > - It throws an exception ("kafka.common.OffsetOutOfRangeException")
> because
> > that message was deleted already
> >
> > Through sheer luck, after a few task failures the job managed to beat
> this
> > race condition, but it begs the question:
> >
> > - How would I tell a consumer to start querying from T-4days? That would
> > totally solve the issue. I don't really need a full 7 days, but I have no
> > way to resolve time -> offset
> > (this is useful if people are tailing the events too, so they can tail
> > events from 3 days ago grepping for something)
> >
> > Any ideas? Anyone else experienced this?
> > --
> > Matthew Rathbone
> > Foursquare | Software Engineer | Server Engineering Team
> > [EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> |
> > 4sq<http://foursquare.com/rathboma>
> >
>

--
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
[EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> |
4sq<http://foursquare.com/rathboma>