Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Throughput issue/load balancing


Copy link to this message
-
Throughput issue/load balancing
Flume 0.9.4-cdh3u3

We have a couple dozen agents connecting to two collectors and a majority of events appear to be flowing into one collector.  I am trying to load archived data from two of the agents (two different hosts) but they hit a throughput limit.  When investigating it appears that all the events are only flowing to one collector.  Shouldn't flume be utilizing the second collector for increased throughput?

My config:
All agents: thriftSource(12345) | autoE2EChain;
Both collectors:  autoCollectorSource | [ackChecker cassandraBasin("analyticsks", "127.0.0.1:9160"), < lazyOpen stubbornAppend thriftSink("127.0.0.1",30313) ? diskFailover insistentOpen lazyOpen stubbornAppend thriftSink("127.0.0.1",30313) >];

Agent logs show them opening a connection to both collectors upon startup.

Thanks,

Roy
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB