Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - flume non-duplication guarantees?


Copy link to this message
-
Re: flume non-duplication guarantees?
Jarek Jarcec Cecho 2012-07-26, 16:34
Could you please create a JIRA for this issue? Please describe your case, attach all configuration files and commands that you've used.

I'm sure that someone will check it out sooner or later.

Jarcec

On Jul 26, 2012, at 5:53 PM, Stern, Mark wrote:

> A, B and C are all using 1.2.0.
> D is using 1.1.0 (because D has an HDFS sink and I am using an old version of hadoop).
> ________________________________________
> From: Jarek Jarcec Cecho [[EMAIL PROTECTED]]
> Sent: Thursday, July 26, 2012 6:15 PM
> To: [EMAIL PROTECTED]
> Subject: Re: flume non-duplication guarantees?
>
> What version of flume were you using Mark?
>
> Based on the "end-to-end configuration" , I would say that you're using old flume (version 0.9.x). If that is true, than the duplicity is unfortunately known flow. We've significantly redesigned flume in 1.x (known as flume-ng) to avoid such issues.
>
> Jarcec
>
> On Jul 26, 2012, at 7:51 AM, Stern, Mark wrote:
>
>> I was testing flume in an end-to-end configuration where A can send to D
>> via B or C. A, B, C and D are all flume agents with file channels. In
>> the course of the test, I killed and restarted B and C. At the end of
>> the test. I found that all the events reached D, but 100
>> events (that is my batch size on the avro sinks) were duplicated.
>>
>> Is this expected (or at least accepted) behaviour?
>>
>> Thanks,
>>
>> Mark Stern
>