I have a 2 tier flume setup. Tier 1 are agents that accept incomming
requests (http source) and put them on (large) file channels. Tier 2
does a lot of processing on these events (with custom interceptors) and
a custom sink to store the result in a custom data storage. These tier 2
agents use a (small) memory channel.
The tier 2 interceptors and data storage are all mostly IO bound.
I seem to struggle to saturate the tier 2 agents. They are slower than
they should be, mostly due to various flume unrelated reasons.
However, assume that I would like my tier 2 agents to process more
events in parallel. What would be the appropriate way to do this?
Do I need multiple avro sinks on the tier 1 agents that map to the same
tier 2 avro source? I tried this, and this seems to increase the number
of threads on the tier 2 agent that are actually processing events indeed.
Is this the way to do it, or not?