I don't think anything exists like this in Kafka (or contrib), but it
would be a useful addition! Personally, I have written this exact thing
at previous jobs.
As for the Hadoop consumer, since there is a FileSystem implementation
for S3 in Hadoop, it should be possible. The Hadoop consumer works by
writing out data files containing the Kafka messages along side offset
files which contain the last offset read for each partition. If it is
re-consuming from zero each time you run it, it means it's not finding
the offset files from the previous run.
Having used it a bit, the Hadoop consumer is certainly an area that
could use improvement.
On 12/27/12 4:41 AM, Pratyush Chandra wrote: