Kafka, mail # user - Re: Consumer throughput imbalance - 2013-08-25, 17:17
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
-
Re: Consumer throughput imbalance
When I said "some messages take longer than others" that may have been misleading. What I meant there is that the performance of the entire application is inconsistent, mostly due to pressure from other applications (mapreduce) on our HBase and MySQL backends. On top of that, some messages just contain more data. Now I suppose what you're suggesting is that I segment my messages by the average or expected time it takes the payloads to process, but I suspect what will happen if I do that is I will have several consumers doing nothing most of the time, and the rest of them backlogged inconsistently the same way they are now. The problem isn't so much the size of the payloads but the fact that we're seeing some messages, which i suspect are in partitions with lots of longer running processing tasks, sit around for hours without getting consumed. That's what I'm trying to solve.  

Is there any way to "add more consumers" without actually adding more consumer JVM processes? We've hit something of a saturation point for our MySQL database. Is this maybe where having multiple consumer threads would help? If so, given that I have a singular shared processing queue in each consumer, how would I leverage that to solve this problem?

Ian Friedman
On Sunday, August 25, 2013 at 12:13 PM, Mark wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB