Is there any documentation on the circumstances under which flume ng will either drop events or possibly send events twice resulting in duplicates?
I seem to be able to run into both situations with a test setup under high contention, using a agent1[syslog source --> file channel --> avro sink] --> agent2[avro source, file channel, hdfs sink]. I drop events with the default values for the timeouts on the file channels in combination with letting agent1 become unavailable for some period of time (causing rsyslog to build up a queue). The same situation with higher timeouts leads to a number of duplicate events (about 500 after 2.5M events).
(BTW: is there an official ascii art notation for flume setups?)
Thanks for any pointers,