Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> flume non-duplication guarantees?

Stern, Mark 2012-07-26, 05:51
Jarek Jarcec Cecho 2012-07-26, 15:15
Stern, Mark 2012-07-26, 15:53
Copy link to this message
Re: flume non-duplication guarantees?
Could you please create a JIRA for this issue? Please describe your case, attach all configuration files and commands that you've used.

I'm sure that someone will check it out sooner or later.


On Jul 26, 2012, at 5:53 PM, Stern, Mark wrote:

> A, B and C are all using 1.2.0.
> D is using 1.1.0 (because D has an HDFS sink and I am using an old version of hadoop).
> ________________________________________
> From: Jarek Jarcec Cecho [[EMAIL PROTECTED]]
> Sent: Thursday, July 26, 2012 6:15 PM
> Subject: Re: flume non-duplication guarantees?
> What version of flume were you using Mark?
> Based on the "end-to-end configuration" , I would say that you're using old flume (version 0.9.x). If that is true, than the duplicity is unfortunately known flow. We've significantly redesigned flume in 1.x (known as flume-ng) to avoid such issues.
> Jarcec
> On Jul 26, 2012, at 7:51 AM, Stern, Mark wrote:
>> I was testing flume in an end-to-end configuration where A can send to D
>> via B or C. A, B, C and D are all flume agents with file channels. In
>> the course of the test, I killed and restarted B and C. At the end of
>> the test. I found that all the events reached D, but 100
>> events (that is my batch size on the avro sinks) were duplicated.
>> Is this expected (or at least accepted) behaviour?
>> Thanks,
>> Mark Stern