-Consuming from X days ago & issues consuming from the beginning of time
I've come across this behavior with the hadoop-consumer, but it certainly
applies to any consumer.
We've had our brokers up and running for about 9 days, with a 7-day
retention policy. (3 brokers with 3 partitions each)
I've just deployed a new hadoop consumer and wanted to read from the
beginning of time (7-days ago).
Here's the behavior I'm seeing:
- I tell the consumer to start from 0
- It queries the partition, finds the minimum available is 2000000, so it
- It starts consuming from 2000000+
- It throws an exception ("kafka.common.OffsetOutOfRangeException") because
that message was deleted already
Through sheer luck, after a few task failures the job managed to beat this
race condition, but it begs the question:
- How would I tell a consumer to start querying from T-4days? That would
totally solve the issue. I don't really need a full 7 days, but I have no
way to resolve time -> offset
(this is useful if people are tailing the events too, so they can tail
events from 3 days ago grepping for something)
Any ideas? Anyone else experienced this?
Foursquare | Software Engineer | Server Engineering Team
[EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> |