Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> multiple Hadoop consumer tasks per partition

Copy link to this message
multiple Hadoop consumer tasks per partition
Hey guys,

I've been using the hadoop consumer a whole lot this week, but I'm seeing
pretty poor throughput with one task per partition. I figured a good
solution would be to have multiple tasks per partition, so I wanted to run
my assumptions by you all first:

This should enable the broker to round robin events between tasks right?

When I record the high-watermark at the end of the mapreduce job there will
be N entries for each partition (one per task), so is it correct to just
take max(watermarks)?
-- my assumption is that as they're getting events round-robin, everything
should have been consumed up to the highest watermark found. Does this hold

Is anyone else using the consumer like this?

Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
[EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> |