What version of flume were you using Mark?
Based on the "end-to-end configuration" , I would say that you're using old flume (version 0.9.x). If that is true, than the duplicity is unfortunately known flow. We've significantly redesigned flume in 1.x (known as flume-ng) to avoid such issues.
On Jul 26, 2012, at 7:51 AM, Stern, Mark wrote:
> I was testing flume in an end-to-end configuration where A can send to D
> via B or C. A, B, C and D are all flume agents with file channels. In
> the course of the test, I killed and restarted B and C. At the end of
> the test. I found that all the events reached D, but 100
> events (that is my batch size on the avro sinks) were duplicated.
> Is this expected (or at least accepted) behaviour?
> Mark Stern