Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Guarantees of the memory channel for delivering to sink


Copy link to this message
-
Re: Guarantees of the memory channel for delivering to sink
Hi,

Yes if you use memory channel, you can lose data. To not lose data, file
channel needs to write to disk...

Brock

On Wed, Nov 7, 2012 at 1:29 PM, Rahul Ravindran <[EMAIL PROTECTED]> wrote:

> Ping on the below questions about new Spool Directory source:
>
> If we choose to use the memory channel with this source, to an Avro sink
> on a remote box, do we risk data loss in the eventuality of a network
> partition/slow network or if the flume-agent on the source box dies?
> If we choose to use file channel with this source, we will result in
> double writes to disk, correct? (one for the legacy log files which will be
> ingested by the Spool Directory source, and the other for the WAL)
>
>
>   ------------------------------
> *From:* Rahul Ravindran <[EMAIL PROTECTED]>
>  *To:* "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> *Sent:* Tuesday, November 6, 2012 3:40 PM
>
> *Subject:* Re: Guarantees of the memory channel for delivering to sink
>
> This is awesome.
> This may be perfect for our use case :)
>
> When is the 1.3 release expected?
>
> Couple of questions for the choice of channel for the new source:
>
> If we choose to use the memory channel with this source, to an Avro sink
> on a remote box, do we risk data loss in the eventuality of a network
> partition/slow network or if the flume-agent on the source box dies?
> If we choose to use file channel with this source, we will result in
> double writes to disk, correct? (one for the legacy log files which will be
> ingested by the Spool Directory source, and the other for the WAL)
>
> Thanks,
> ~Rahul.
>
>   ------------------------------
> *From:* Brock Noland <[EMAIL PROTECTED]>
> *To:* [EMAIL PROTECTED]; Rahul Ravindran <[EMAIL PROTECTED]>
> *Sent:* Tuesday, November 6, 2012 3:05 PM
> *Subject:* Re: Guarantees of the memory channel for delivering to sink
>
> This use case sounds like a perfect use of the Spool DIrectory source
> which will be in the upcoming 1.3 release.
>
> Brock
>
> On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran <[EMAIL PROTECTED]> wrote:
> > We will update the checkpoint each time (we may tune this to be periodic)
> > but the contents of the memory channel will be in the legacy logs which
> are
> > currently being generated.
> >
> > Additionally, the sink for the memory channel will be an Avro source in
> > another machine.
> >
> > Does that clear things up?
> >
> > ________________________________
> > From: Brock Noland <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]; Rahul Ravindran <[EMAIL PROTECTED]>
> > Sent: Tuesday, November 6, 2012 1:44 PM
> >
> > Subject: Re: Guarantees of the memory channel for delivering to sink
> >
> > But in your architecture you are going to write the contents of the
> > memory channel out? Or did I miss something?
> >
> > "The checkpoint will be updated each time we perform a successive
> > insertion into the memory channel."
> >
> > On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran <[EMAIL PROTECTED]>
> wrote:
> >> We have a legacy system which writes events to a file (existing log
> file).
> >> This will continue. If I used a filechannel, I will be double the number
> >> of
> >> IO operations(writes to the legacy log file, and writes to WAL).
> >>
> >> ________________________________
> >> From: Brock Noland <[EMAIL PROTECTED]>
> >> To: [EMAIL PROTECTED]; Rahul Ravindran <[EMAIL PROTECTED]>
> >> Sent: Tuesday, November 6, 2012 1:38 PM
> >> Subject: Re: Guarantees of the memory channel for delivering to sink
> >>
> >> Your still going to be writing out all events, no? So how would file
> >> channel do more IO than that?
> >>
> >> On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran <[EMAIL PROTECTED]>
> wrote:
> >>> Hi,
> >>>    I am very new to Flume and we are hoping to use it for our log
> >>> aggregation into HDFS. I have a few questions below:
> >>>
> >>> FileChannel will double our disk IO, which will affect IO performance
> on
> >>> certain performance sensitive machines. Hence, I was hoping to write a
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/