Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> multiple Hadoop consumer tasks per partition


Copy link to this message
-
multiple Hadoop consumer tasks per partition
Hey guys,

I've been using the hadoop consumer a whole lot this week, but I'm seeing
pretty poor throughput with one task per partition. I figured a good
solution would be to have multiple tasks per partition, so I wanted to run
my assumptions by you all first:

This should enable the broker to round robin events between tasks right?

When I record the high-watermark at the end of the mapreduce job there will
be N entries for each partition (one per task), so is it correct to just
take max(watermarks)?
-- my assumption is that as they're getting events round-robin, everything
should have been consumed up to the highest watermark found. Does this hold
true?

Is anyone else using the consumer like this?

--
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
[EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> |
4sq<http://foursquare.com/rathboma>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB