Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Consuming from X days ago & issues consuming from the beginning of time


Copy link to this message
-
Consuming from X days ago & issues consuming from the beginning of time
Matthew Rathbone 2012-09-20, 16:20
Hey guys,

I've come across this behavior with the hadoop-consumer, but it certainly
applies to any consumer.

We've had our brokers up and running for about 9 days, with a 7-day
retention policy. (3 brokers with 3 partitions each)
I've just deployed a new hadoop consumer and wanted to read from the
beginning of time (7-days ago).

Here's the behavior I'm seeing:
- I tell the consumer to start from 0
- It queries the partition, finds the minimum available is 2000000, so it
starts there
- It starts consuming from 2000000+
- It throws an exception ("kafka.common.OffsetOutOfRangeException") because
that message was deleted already

Through sheer luck, after a few task failures the job managed to beat this
race condition, but it begs the question:

- How would I tell a consumer to start querying from T-4days? That would
totally solve the issue. I don't really need a full 7 days, but I have no
way to resolve time -> offset
(this is useful if people are tailing the events too, so they can tail
events from 3 days ago grepping for something)

Any ideas? Anyone else experienced this?
--
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
[EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> |
4sq<http://foursquare.com/rathboma>