-Re: Very low volume topic
Philip O'Toole 2013-08-14, 11:37
OK, I think I follow you.
Well, if the message volume is very low, then I don't think you need
the performance of Kafka. Perhaps a different design, where your
workers pull from a shared queue in memory somewhere might be better
(perhaps *that* queue could be filled by Kafka consumer reading from
an actual Kafka topic). Yes, the queue may need synchronization to
ensure each job only gets pulled off the queue once, but you said it's
low volume so performance shouldn't be a concern.
On Tue, Aug 13, 2013 at 7:13 PM, Eric Sites <[EMAIL PROTECTED]> wrote:
> Responses inline
> On 8/13/13 9:57 PM, "Philip O'Toole" <[EMAIL PROTECTED]> wrote:
>>My experience is solely with 0.72. More inline.
> I am currently using 0.8.
>>On Tue, Aug 13, 2013 at 6:47 PM, Eric Sites <[EMAIL PROTECTED]>
>>> Hello everyone,
>>> I have a very low volume topic that has 2 consumers in the same group.
>>>How do I get each consumer to only consume 1 message at a time and if
>>>the the first consumer is busy get the other consumer to consume the
>>You can't, not if you only have one partition. Each consumer is
>>dedicated to a single partition. Unless you deliberately tear down the
>>consumer and let another take over that partition (if you are using
>>the high-level consumer).
> I am using multiple partitions, currently 4 partitions.
>>> Currently what I am doing is:
>>> First consumer connects to Kafka waits for 300 milliseconds then
>>>disconnects, waits for 10 seconds, then reconnects to see if there is a
>>I don't think you need to do this. The high-level has a API that
>>allows you to set this timeout (I think).
> I am using that timeout on the high-level consumer, that is the 300
> millisecond wait period. Then I do a consumer.shutdown(), wait 10 seconds
> and reconnect.
>>> The messages kick off a long task on each server, each server can
>>>handle multiple tasks up to a limit so first I am trying to balance the
>>>tasks across multiple servers and if they are maxed out don't consume
>>> This will give the other server or servers a chances to pickup a
>>>message and do the task.
>>> I would not disconnect if I can ensure I don't have messages waiting in
>>>the queue for a server to consume them without the other servers being
>>>able to see them.
>>I think a better design would be to have a basic consumer that drains
>>the topic and hands jobs to the set of available workers. *Those*
>>workers perform the long-running job. Only if there are no available
>>workers does the consumer block. You may be trying to do too much in
> The available workers are entire servers, that can produce lots of network
> IO and generate 100k+ Kafka messages to other Kafka topics that get
> consumed by Hadoop and other systems.
> I used Kafka for these start job messages because I already was using
> Kafka for other messages, and I will most likely add more servers to
> consume this start job messages.
> I don¹t know how long the job will take until I consume the start job
> message. Sometimes it may only take seconds or could take hours.
> I have a managed thread pool that only allows x number of tasks types to
> run at one time from each job, so that one job does not overwhelm a single
> server. This allows a server to handle multiple things while waiting on
> the network IO.
> My only issue is balancing the job start messages across multiple servers
> depending on the servers load/available threads in the thread pool.
> The only real issue I am currently having is that I think this frequent
> connect/disconnect is causing issue on the Kafka servers with rebalancing
> the 4 topics back and forth between the worker servers.
>>> Thanks for the help...
>>> Eric Sites
> - Eric Sites