So here's an outline of what I think seems to have happened.
I have a consumer, that uses a filter to consume a large number of topics
(e.g. several hundred). Each topic has only a single partition.
It normally has no trouble keeping up processing all messages on all
topics. However, we had a case a couple days ago where it seemed to hang,
and not consume anything for several hours. I restarted the consumer (and
now I've updated it from 0.8-beta1 to 0.8-latest-HEAD). Data is flowing
again, but some topics are seeming to take much longer than others to catch
up. The slow ones seem to be the topics that have more data than others (a
loose theory at present).
Does that make sense? If I understand things correctly, the consumer will
fetch chunks of data from each topic/partition, in order, in a big loop?
So if it has caught up with most of the topics, will it waste time
re-polling all those (and getting nothing) before coming back to the topics
that are lagging? Perhaps having a larger fetch size would help here?
On Sat, Oct 19, 2013 at 6:24 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote: