|
|
-
flume non-duplication guarantees?
Stern, Mark 2012-07-26, 05:51
I was testing flume in an end-to-end configuration where A can send to D via B or C. A, B, C and D are all flume agents with file channels. In the course of the test, I killed and restarted B and C. At the end of the test. I found that all the events reached D, but 100 events (that is my batch size on the avro sinks) were duplicated.
Is this expected (or at least accepted) behaviour?
Thanks,
Mark Stern
-
Re: flume non-duplication guarantees?
Jarek Jarcec Cecho 2012-07-26, 15:15
What version of flume were you using Mark?
Based on the "end-to-end configuration" , I would say that you're using old flume (version 0.9.x). If that is true, than the duplicity is unfortunately known flow. We've significantly redesigned flume in 1.x (known as flume-ng) to avoid such issues.
Jarcec
On Jul 26, 2012, at 7:51 AM, Stern, Mark wrote:
> I was testing flume in an end-to-end configuration where A can send to D > via B or C. A, B, C and D are all flume agents with file channels. In > the course of the test, I killed and restarted B and C. At the end of > the test. I found that all the events reached D, but 100 > events (that is my batch size on the avro sinks) were duplicated. > > Is this expected (or at least accepted) behaviour? > > Thanks, > > Mark Stern
-
RE: flume non-duplication guarantees?
Stern, Mark 2012-07-26, 15:53
A, B and C are all using 1.2.0. D is using 1.1.0 (because D has an HDFS sink and I am using an old version of hadoop). ________________________________________ From: Jarek Jarcec Cecho [[EMAIL PROTECTED]] Sent: Thursday, July 26, 2012 6:15 PM To: [EMAIL PROTECTED] Subject: Re: flume non-duplication guarantees?
What version of flume were you using Mark?
Based on the "end-to-end configuration" , I would say that you're using old flume (version 0.9.x). If that is true, than the duplicity is unfortunately known flow. We've significantly redesigned flume in 1.x (known as flume-ng) to avoid such issues.
Jarcec
On Jul 26, 2012, at 7:51 AM, Stern, Mark wrote:
> I was testing flume in an end-to-end configuration where A can send to D > via B or C. A, B, C and D are all flume agents with file channels. In > the course of the test, I killed and restarted B and C. At the end of > the test. I found that all the events reached D, but 100 > events (that is my batch size on the avro sinks) were duplicated. > > Is this expected (or at least accepted) behaviour? > > Thanks, > > Mark Stern
-
Re: flume non-duplication guarantees?
Jarek Jarcec Cecho 2012-07-26, 16:34
Could you please create a JIRA for this issue? Please describe your case, attach all configuration files and commands that you've used.
I'm sure that someone will check it out sooner or later.
Jarcec
On Jul 26, 2012, at 5:53 PM, Stern, Mark wrote:
> A, B and C are all using 1.2.0. > D is using 1.1.0 (because D has an HDFS sink and I am using an old version of hadoop). > ________________________________________ > From: Jarek Jarcec Cecho [[EMAIL PROTECTED]] > Sent: Thursday, July 26, 2012 6:15 PM > To: [EMAIL PROTECTED] > Subject: Re: flume non-duplication guarantees? > > What version of flume were you using Mark? > > Based on the "end-to-end configuration" , I would say that you're using old flume (version 0.9.x). If that is true, than the duplicity is unfortunately known flow. We've significantly redesigned flume in 1.x (known as flume-ng) to avoid such issues. > > Jarcec > > On Jul 26, 2012, at 7:51 AM, Stern, Mark wrote: > >> I was testing flume in an end-to-end configuration where A can send to D >> via B or C. A, B, C and D are all flume agents with file channels. In >> the course of the test, I killed and restarted B and C. At the end of >> the test. I found that all the events reached D, but 100 >> events (that is my batch size on the avro sinks) were duplicated. >> >> Is this expected (or at least accepted) behaviour? >> >> Thanks, >> >> Mark Stern >
|
|