Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Guarantees of the memory channel for delivering to sink


+
Rahul Ravindran 2012-11-06, 21:32
+
Brock Noland 2012-11-06, 21:38
+
Rahul Ravindran 2012-11-06, 21:43
+
Brock Noland 2012-11-06, 21:44
+
Rahul Ravindran 2012-11-06, 22:53
+
Brock Noland 2012-11-06, 23:05
+
Rahul Ravindran 2012-11-06, 23:40
+
Rahul Ravindran 2012-11-07, 19:29
Copy link to this message
-
Re: Guarantees of the memory channel for delivering to sink
Brock Noland 2012-11-07, 19:48
Hi,

Yes if you use memory channel, you can lose data. To not lose data, file
channel needs to write to disk...

Brock

On Wed, Nov 7, 2012 at 1:29 PM, Rahul Ravindran <[EMAIL PROTECTED]> wrote:

> Ping on the below questions about new Spool Directory source:
>
> If we choose to use the memory channel with this source, to an Avro sink
> on a remote box, do we risk data loss in the eventuality of a network
> partition/slow network or if the flume-agent on the source box dies?
> If we choose to use file channel with this source, we will result in
> double writes to disk, correct? (one for the legacy log files which will be
> ingested by the Spool Directory source, and the other for the WAL)
>
>
>   ------------------------------
> *From:* Rahul Ravindran <[EMAIL PROTECTED]>
>  *To:* "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> *Sent:* Tuesday, November 6, 2012 3:40 PM
>
> *Subject:* Re: Guarantees of the memory channel for delivering to sink
>
> This is awesome.
> This may be perfect for our use case :)
>
> When is the 1.3 release expected?
>
> Couple of questions for the choice of channel for the new source:
>
> If we choose to use the memory channel with this source, to an Avro sink
> on a remote box, do we risk data loss in the eventuality of a network
> partition/slow network or if the flume-agent on the source box dies?
> If we choose to use file channel with this source, we will result in
> double writes to disk, correct? (one for the legacy log files which will be
> ingested by the Spool Directory source, and the other for the WAL)
>
> Thanks,
> ~Rahul.
>
>   ------------------------------
> *From:* Brock Noland <[EMAIL PROTECTED]>
> *To:* [EMAIL PROTECTED]; Rahul Ravindran <[EMAIL PROTECTED]>
> *Sent:* Tuesday, November 6, 2012 3:05 PM
> *Subject:* Re: Guarantees of the memory channel for delivering to sink
>
> This use case sounds like a perfect use of the Spool DIrectory source
> which will be in the upcoming 1.3 release.
>
> Brock
>
> On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran <[EMAIL PROTECTED]> wrote:
> > We will update the checkpoint each time (we may tune this to be periodic)
> > but the contents of the memory channel will be in the legacy logs which
> are
> > currently being generated.
> >
> > Additionally, the sink for the memory channel will be an Avro source in
> > another machine.
> >
> > Does that clear things up?
> >
> > ________________________________
> > From: Brock Noland <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]; Rahul Ravindran <[EMAIL PROTECTED]>
> > Sent: Tuesday, November 6, 2012 1:44 PM
> >
> > Subject: Re: Guarantees of the memory channel for delivering to sink
> >
> > But in your architecture you are going to write the contents of the
> > memory channel out? Or did I miss something?
> >
> > "The checkpoint will be updated each time we perform a successive
> > insertion into the memory channel."
> >
> > On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran <[EMAIL PROTECTED]>
> wrote:
> >> We have a legacy system which writes events to a file (existing log
> file).
> >> This will continue. If I used a filechannel, I will be double the number
> >> of
> >> IO operations(writes to the legacy log file, and writes to WAL).
> >>
> >> ________________________________
> >> From: Brock Noland <[EMAIL PROTECTED]>
> >> To: [EMAIL PROTECTED]; Rahul Ravindran <[EMAIL PROTECTED]>
> >> Sent: Tuesday, November 6, 2012 1:38 PM
> >> Subject: Re: Guarantees of the memory channel for delivering to sink
> >>
> >> Your still going to be writing out all events, no? So how would file
> >> channel do more IO than that?
> >>
> >> On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran <[EMAIL PROTECTED]>
> wrote:
> >>> Hi,
> >>>    I am very new to Flume and we are hoping to use it for our log
> >>> aggregation into HDFS. I have a few questions below:
> >>>
> >>> FileChannel will double our disk IO, which will affect IO performance
> on
> >>> certain performance sensitive machines. Hence, I was hoping to write a
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
+
Rahul Ravindran 2012-11-07, 19:52
+
Brock Noland 2012-11-07, 20:14
+
Rahul Ravindran 2012-11-07, 21:18
+
Roshan Naik 2012-11-07, 22:57