Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Strategies for improving Consumer throughput


Copy link to this message
-
Re: Strategies for improving Consumer throughput
Is this with 0.7 or 0.8?
On Wed, Oct 2, 2013 at 12:59 PM, Joe Stein <[EMAIL PROTECTED]> wrote:

> Are you sure the consumers are behind? could the pause be because the
> stream is empty and producing messages is what is behind the consumption?
>
> What if you shut off your consumers for 5 minutes and then start them again
> do the consumers behave the same way?
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>
>
> On Wed, Oct 2, 2013 at 3:54 PM, Graeme Wallace <
> [EMAIL PROTECTED]> wrote:
>
> > Hi All,
> >
> > We've got processes that produce many millions of itineraries per minute.
> > We would like to get them into HBase (so we can query for chunks of them
> > later) - so our idea was to write each itinerary as a message into Kafka
> -
> > so that not only can we have consumers that write to HBase, but also
> other
> > consumers that may provide some sort of real-time monitoring service and
> > also an archive service.
> >
> > Problem is - we don't really know enough about how best to do this
> > effectively with Kafka, so that the producers can run flat out and the
> > consumers can run flat out too. We've tried having one topic, with
> multiple
> > partitions to match the spindles on our broker h/w (12 on each) - and
> > setting up a thread per partition on the consumer side.
> >
> > At the moment, our particular problem is that the consumers just can't
> keep
> > up. We can see from logging that the consumer threads seem to run in
> > bursts, then a pause (as yet we don't know what the pause is - dont think
> > its GC). Anyways, does what we are doing with one topic and multiple
> > partitions sound correct ? Or do we need to change ? Any tricks to speed
> up
> > consumption ? (we've tried changing the fetch size - doesnt help much).
> Am
> > i correct in assuming we can have one thread per partition for
> consumption
> > ?
> >
> > Thanks in advance,
> >
> > Graeme
> >
> > --
> > Graeme Wallace
> > CTO
> > FareCompare.com
> > O: 972 588 1414
> > M: 214 681 9018
> >
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB