Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Flume NG docs on duplicate or dropped log events

Copy link to this message
Flume NG docs on duplicate or dropped log events
Friso van Vollenhoven 2013-01-28, 21:54
Hi All,

Is there any documentation on the circumstances under which flume ng will either drop events or possibly send events twice resulting in duplicates?

I seem to be able to run into both situations with a test setup under high contention, using a agent1[syslog source --> file channel --> avro sink] --> agent2[avro source, file channel, hdfs sink]. I drop events with the default values for the timeouts on the file channels in combination with letting agent1 become unavailable for some period of time (causing rsyslog to build up a queue). The same situation with higher timeouts leads to a number of duplicate events (about 500 after 2.5M events).

(BTW: is there an official ascii art notation for flume setups?)
Thanks for any pointers,