Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Flume NG docs on duplicate or dropped log events

Copy link to this message
Flume NG docs on duplicate or dropped log events
Hi All,

Is there any documentation on the circumstances under which flume ng will either drop events or possibly send events twice resulting in duplicates?

I seem to be able to run into both situations with a test setup under high contention, using a agent1[syslog source --> file channel --> avro sink] --> agent2[avro source, file channel, hdfs sink]. I drop events with the default values for the timeouts on the file channels in combination with letting agent1 become unavailable for some period of time (causing rsyslog to build up a queue). The same situation with higher timeouts leads to a number of duplicate events (about 500 after 2.5M events).

(BTW: is there an official ascii art notation for flume setups?)
Thanks for any pointers,