Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Throughput issue/load balancing

Copy link to this message
Throughput issue/load balancing
Flume 0.9.4-cdh3u3

We have a couple dozen agents connecting to two collectors and a majority of events appear to be flowing into one collector.  I am trying to load archived data from two of the agents (two different hosts) but they hit a throughput limit.  When investigating it appears that all the events are only flowing to one collector.  Shouldn't flume be utilizing the second collector for increased throughput?

My config:
All agents: thriftSource(12345) | autoE2EChain;
Both collectors:  autoCollectorSource | [ackChecker cassandraBasin("analyticsks", ""), < lazyOpen stubbornAppend thriftSink("",30313) ? diskFailover insistentOpen lazyOpen stubbornAppend thriftSink("",30313) >];

Agent logs show them opening a connection to both collectors upon startup.