Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> multiple Hadoop consumer tasks per partition


Copy link to this message
-
Re: multiple Hadoop consumer tasks per partition
Hey,

So I'm currently running one mapper per-partition. I guess I didn't state
this, but my code is based on the hadoop-consumer in the contrib/ project.
I was really wondering whether anyone has tried multiple consumers per
partition.

On Mon, Sep 17, 2012 at 6:54 PM, Min Yu <[EMAIL PROTECTED]> wrote:

> If you want run each Mapper job per partition,
>
> https://github.com/miniway/kafka-hadoop-consumer
>
> might help.
>
> Thanks
> Min
>
> 2012. 9. 18. 오전 6:51 Matthew Rathbone <[EMAIL PROTECTED]> 작성:
>
> > Hey guys,
> >
> > I've been using the hadoop consumer a whole lot this week, but I'm seeing
> > pretty poor throughput with one task per partition. I figured a good
> > solution would be to have multiple tasks per partition, so I wanted to
> run
> > my assumptions by you all first:
> >
> > This should enable the broker to round robin events between tasks right?
> >
> > When I record the high-watermark at the end of the mapreduce job there
> will
> > be N entries for each partition (one per task), so is it correct to just
> > take max(watermarks)?
> > -- my assumption is that as they're getting events round-robin,
> everything
> > should have been consumed up to the highest watermark found. Does this
> hold
> > true?
> >
> > Is anyone else using the consumer like this?
> >
> >
> >
> > --
> > Matthew Rathbone
> > Foursquare | Software Engineer | Server Engineering Team
> > [EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> |
> > 4sq<http://foursquare.com/rathboma>
>

--
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
[EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> |
4sq<http://foursquare.com/rathboma>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB