Re: Consumer throughput imbalance
On Sunday, August 25, 2013 at 3:11 PM, Jay Kreps wrote:
The problem is that some consumers are slower than others, due to a lot of factors such as resource contention on the box itself, on our HBase cluster, and the actual processing it's doing itself. We are sending very small messages that are actually HDFS paths, which then get opened on the consumers and read. Each of these files takes between 1-15 minutes to process, and sometimes can take up to 30 minutes when the load on our hbase cluster is very high from certain MR jobs. We were hoping to get some experience with Kafka and flush out any issues with our use of the project before implementing a solution that actually queued all the data in those HDFS files to Kafka itself, and this seemed like a good intermediate step.