Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # user - Strategies for improving Consumer throughput


+
Graeme Wallace 2013-10-02, 19:55
+
Joe Stein 2013-10-02, 19:59
+
Philip OToole 2013-10-02, 20:01
Copy link to this message
-
Re: Strategies for improving Consumer throughput
Graeme Wallace 2013-10-02, 20:33
This is with 0.8
On Wed, Oct 2, 2013 at 2:00 PM, Philip O'Toole <[EMAIL PROTECTED]> wrote:

> Is this with 0.7 or 0.8?
>
>
> On Wed, Oct 2, 2013 at 12:59 PM, Joe Stein <[EMAIL PROTECTED]> wrote:
>
> > Are you sure the consumers are behind? could the pause be because the
> > stream is empty and producing messages is what is behind the consumption?
> >
> > What if you shut off your consumers for 5 minutes and then start them
> again
> > do the consumers behave the same way?
> >
> > /*******************************************
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > ********************************************/
> >
> >
> > On Wed, Oct 2, 2013 at 3:54 PM, Graeme Wallace <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Hi All,
> > >
> > > We've got processes that produce many millions of itineraries per
> minute.
> > > We would like to get them into HBase (so we can query for chunks of
> them
> > > later) - so our idea was to write each itinerary as a message into
> Kafka
> > -
> > > so that not only can we have consumers that write to HBase, but also
> > other
> > > consumers that may provide some sort of real-time monitoring service
> and
> > > also an archive service.
> > >
> > > Problem is - we don't really know enough about how best to do this
> > > effectively with Kafka, so that the producers can run flat out and the
> > > consumers can run flat out too. We've tried having one topic, with
> > multiple
> > > partitions to match the spindles on our broker h/w (12 on each) - and
> > > setting up a thread per partition on the consumer side.
> > >
> > > At the moment, our particular problem is that the consumers just can't
> > keep
> > > up. We can see from logging that the consumer threads seem to run in
> > > bursts, then a pause (as yet we don't know what the pause is - dont
> think
> > > its GC). Anyways, does what we are doing with one topic and multiple
> > > partitions sound correct ? Or do we need to change ? Any tricks to
> speed
> > up
> > > consumption ? (we've tried changing the fetch size - doesnt help much).
> > Am
> > > i correct in assuming we can have one thread per partition for
> > consumption
> > > ?
> > >
> > > Thanks in advance,
> > >
> > > Graeme
> > >
> > > --
> > > Graeme Wallace
> > > CTO
> > > FareCompare.com
> > > O: 972 588 1414
> > > M: 214 681 9018
> > >
> >
>

--
Graeme Wallace
CTO
FareCompare.com
O: 972 588 1414
M: 214 681 9018

 
+
Graeme Wallace 2013-10-02, 20:36
+
Graeme Wallace 2013-10-02, 23:19