Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> Strategies for improving Consumer throughput


+
Graeme Wallace 2013-10-02, 19:55
+
Joe Stein 2013-10-02, 19:59
+
Philip OToole 2013-10-02, 20:01
+
Graeme Wallace 2013-10-02, 20:33
Copy link to this message
-
Re: Strategies for improving Consumer throughput
Yes, definitely consumers are behind - we can see from examining the offsets
On Wed, Oct 2, 2013 at 1:59 PM, Joe Stein <[EMAIL PROTECTED]> wrote:

> Are you sure the consumers are behind? could the pause be because the
> stream is empty and producing messages is what is behind the consumption?
>
> What if you shut off your consumers for 5 minutes and then start them again
> do the consumers behave the same way?
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>
>
> On Wed, Oct 2, 2013 at 3:54 PM, Graeme Wallace <
> [EMAIL PROTECTED]> wrote:
>
> > Hi All,
> >
> > We've got processes that produce many millions of itineraries per minute.
> > We would like to get them into HBase (so we can query for chunks of them
> > later) - so our idea was to write each itinerary as a message into Kafka
> -
> > so that not only can we have consumers that write to HBase, but also
> other
> > consumers that may provide some sort of real-time monitoring service and
> > also an archive service.
> >
> > Problem is - we don't really know enough about how best to do this
> > effectively with Kafka, so that the producers can run flat out and the
> > consumers can run flat out too. We've tried having one topic, with
> multiple
> > partitions to match the spindles on our broker h/w (12 on each) - and
> > setting up a thread per partition on the consumer side.
> >
> > At the moment, our particular problem is that the consumers just can't
> keep
> > up. We can see from logging that the consumer threads seem to run in
> > bursts, then a pause (as yet we don't know what the pause is - dont think
> > its GC). Anyways, does what we are doing with one topic and multiple
> > partitions sound correct ? Or do we need to change ? Any tricks to speed
> up
> > consumption ? (we've tried changing the fetch size - doesnt help much).
> Am
> > i correct in assuming we can have one thread per partition for
> consumption
> > ?
> >
> > Thanks in advance,
> >
> > Graeme
> >
> > --
> > Graeme Wallace
> > CTO
> > FareCompare.com
> > O: 972 588 1414
> > M: 214 681 9018
> >
>

--
Graeme Wallace
CTO
FareCompare.com
O: 972 588 1414
M: 214 681 9018

 
+
Graeme Wallace 2013-10-02, 23:19