Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> increase load on tier2 flume agents

Copy link to this message
increase load on tier2 flume agents

I have a 2 tier flume setup. Tier 1 are agents that accept incomming
requests (http source) and put them on (large) file channels. Tier 2
does a lot of processing on these events (with custom interceptors) and
a custom sink to store the result in a custom data storage. These tier 2
agents use a (small) memory channel.

The tier 2 interceptors and data storage are all mostly IO bound.

I seem to struggle to saturate the tier 2 agents. They are slower than
they should be, mostly due to various flume unrelated reasons.

However, assume that I would like my tier 2 agents to process more
events in parallel. What would be the appropriate way to do this?

Do I need multiple avro sinks on the tier 1 agents that map to the same
tier 2 avro source? I tried this, and this seems to increase the number
of threads on the tier 2 agent that are actually processing events indeed.

Is this the way to do it, or not?