Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # user - Strategies for improving Consumer throughput


+
Graeme Wallace 2013-10-02, 19:55
+
Joe Stein 2013-10-02, 19:59
+
Philip OToole 2013-10-02, 20:01
+
Graeme Wallace 2013-10-02, 20:33
+
Graeme Wallace 2013-10-02, 20:36
Copy link to this message
-
Re: Strategies for improving Consumer throughput
Graeme Wallace 2013-10-02, 23:19
Ok. - so we figured out what the problem was with the consumers lagging
behind.

We were pushing 800Mbits/sec+ to the consumer interface - so the 1Gb
network interface was maxed out.

Graeme

On Wed, Oct 2, 2013 at 2:35 PM, Graeme Wallace <
[EMAIL PROTECTED]> wrote:

> Yes, definitely consumers are behind - we can see from examining the
> offsets
>
>
> On Wed, Oct 2, 2013 at 1:59 PM, Joe Stein <[EMAIL PROTECTED]> wrote:
>
>> Are you sure the consumers are behind? could the pause be because the
>> stream is empty and producing messages is what is behind the consumption?
>>
>> What if you shut off your consumers for 5 minutes and then start them
>> again
>> do the consumers behave the same way?
>>
>> /*******************************************
>>  Joe Stein
>>  Founder, Principal Consultant
>>  Big Data Open Source Security LLC
>>  http://www.stealth.ly
>>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>> ********************************************/
>>
>>
>> On Wed, Oct 2, 2013 at 3:54 PM, Graeme Wallace <
>> [EMAIL PROTECTED]> wrote:
>>
>> > Hi All,
>> >
>> > We've got processes that produce many millions of itineraries per
>> minute.
>> > We would like to get them into HBase (so we can query for chunks of them
>> > later) - so our idea was to write each itinerary as a message into
>> Kafka -
>> > so that not only can we have consumers that write to HBase, but also
>> other
>> > consumers that may provide some sort of real-time monitoring service and
>> > also an archive service.
>> >
>> > Problem is - we don't really know enough about how best to do this
>> > effectively with Kafka, so that the producers can run flat out and the
>> > consumers can run flat out too. We've tried having one topic, with
>> multiple
>> > partitions to match the spindles on our broker h/w (12 on each) - and
>> > setting up a thread per partition on the consumer side.
>> >
>> > At the moment, our particular problem is that the consumers just can't
>> keep
>> > up. We can see from logging that the consumer threads seem to run in
>> > bursts, then a pause (as yet we don't know what the pause is - dont
>> think
>> > its GC). Anyways, does what we are doing with one topic and multiple
>> > partitions sound correct ? Or do we need to change ? Any tricks to
>> speed up
>> > consumption ? (we've tried changing the fetch size - doesnt help much).
>> Am
>> > i correct in assuming we can have one thread per partition for
>> consumption
>> > ?
>> >
>> > Thanks in advance,
>> >
>> > Graeme
>> >
>> > --
>> > Graeme Wallace
>> > CTO
>> > FareCompare.com
>> > O: 972 588 1414
>> > M: 214 681 9018
>> >
>>
>
>
>
> --
> Graeme Wallace
> CTO
> FareCompare.com
> O: 972 588 1414
> M: 214 681 9018
>
>
--
Graeme Wallace
CTO
FareCompare.com
O: 972 588 1414
M: 214 681 9018