I've been using the hadoop consumer a whole lot this week, but I'm seeing
pretty poor throughput with one task per partition. I figured a good
solution would be to have multiple tasks per partition, so I wanted to run
my assumptions by you all first:
This should enable the broker to round robin events between tasks right?
When I record the high-watermark at the end of the mapreduce job there will
be N entries for each partition (one per task), so is it correct to just
-- my assumption is that as they're getting events round-robin, everything
should have been consumed up to the highest watermark found. Does this hold
Is anyone else using the consumer like this?
Foursquare | Software Engineer | Server Engineering Team
[EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> |