So, I think there are 2 different types of errors you mention.  The
first is data-dependent (e.g. it's corrupt or some such).  So, there's
no reason to block consumption of other messages that are likely to be
successful, while the data-dependent one won't fix itself no matter
times you retry.  So, for that, I think it makes sense to stash it
away to be retried later (or just logged as invalid and carry on).

For transient failures (e.g. a downstream dependent service is not
available), then I think it's fine to just keep retrying (with
exponential back off) until it succeeds, before processing the next
message (which will also likely fail for the same reason).


