Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> multiple Hadoop consumer tasks per partition


Copy link to this message
-
Re: multiple Hadoop consumer tasks per partition
Hey,

So I'm currently running one mapper per-partition. I guess I didn't state
this, but my code is based on the hadoop-consumer in the contrib/ project.
I was really wondering whether anyone has tried multiple consumers per
partition.

On Mon, Sep 17, 2012 at 6:54 PM, Min Yu <[EMAIL PROTECTED]> wrote:

> If you want run each Mapper job per partition,
>
> https://github.com/miniway/kafka-hadoop-consumer
>
> might help.
>
> Thanks
> Min
>
> 2012. 9. 18. 오전 6:51 Matthew Rathbone <[EMAIL PROTECTED]> 작성:
>
> > Hey guys,
> >
> > I've been using the hadoop consumer a whole lot this week, but I'm seeing
> > pretty poor throughput with one task per partition. I figured a good
> > solution would be to have multiple tasks per partition, so I wanted to
> run
> > my assumptions by you all first:
> >
> > This should enable the broker to round robin events between tasks right?
> >
> > When I record the high-watermark at the end of the mapreduce job there
> will
> > be N entries for each partition (one per task), so is it correct to just
> > take max(watermarks)?
> > -- my assumption is that as they're getting events round-robin,
> everything
> > should have been consumed up to the highest watermark found. Does this
> hold
> > true?
> >
> > Is anyone else using the consumer like this?
> >
> >
> >
> > --
> > Matthew Rathbone
> > Foursquare | Software Engineer | Server Engineering Team
> > [EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> |
> > 4sq<http://foursquare.com/rathboma>
>

--
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
[EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> |
4sq<http://foursquare.com/rathboma>