We've got processes that produce many millions of itineraries per minute.
We would like to get them into HBase (so we can query for chunks of them
later) - so our idea was to write each itinerary as a message into Kafka -
so that not only can we have consumers that write to HBase, but also other
consumers that may provide some sort of real-time monitoring service and
also an archive service.
Problem is - we don't really know enough about how best to do this
effectively with Kafka, so that the producers can run flat out and the
consumers can run flat out too. We've tried having one topic, with multiple
partitions to match the spindles on our broker h/w (12 on each) - and
setting up a thread per partition on the consumer side.
At the moment, our particular problem is that the consumers just can't keep
up. We can see from logging that the consumer threads seem to run in
bursts, then a pause (as yet we don't know what the pause is - dont think
its GC). Anyways, does what we are doing with one topic and multiple
partitions sound correct ? Or do we need to change ? Any tricks to speed up
consumption ? (we've tried changing the fetch size - doesnt help much). Am
i correct in assuming we can have one thread per partition for consumption ?
Thanks in advance,
O: 972 588 1414
M: 214 681 9018