Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Guarantees of the memory channel for delivering to sink


Copy link to this message
-
Re: Guarantees of the memory channel for delivering to sink
This use case sounds like a perfect use of the Spool DIrectory source
which will be in the upcoming 1.3 release.

Brock

On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran <[EMAIL PROTECTED]> wrote:
> We will update the checkpoint each time (we may tune this to be periodic)
> but the contents of the memory channel will be in the legacy logs which are
> currently being generated.
>
> Additionally, the sink for the memory channel will be an Avro source in
> another machine.
>
> Does that clear things up?
>
> ________________________________
> From: Brock Noland <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; Rahul Ravindran <[EMAIL PROTECTED]>
> Sent: Tuesday, November 6, 2012 1:44 PM
>
> Subject: Re: Guarantees of the memory channel for delivering to sink
>
> But in your architecture you are going to write the contents of the
> memory channel out? Or did I miss something?
>
> "The checkpoint will be updated each time we perform a successive
> insertion into the memory channel."
>
> On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran <[EMAIL PROTECTED]> wrote:
>> We have a legacy system which writes events to a file (existing log file).
>> This will continue. If I used a filechannel, I will be double the number
>> of
>> IO operations(writes to the legacy log file, and writes to WAL).
>>
>> ________________________________
>> From: Brock Noland <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]; Rahul Ravindran <[EMAIL PROTECTED]>
>> Sent: Tuesday, November 6, 2012 1:38 PM
>> Subject: Re: Guarantees of the memory channel for delivering to sink
>>
>> Your still going to be writing out all events, no? So how would file
>> channel do more IO than that?
>>
>> On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran <[EMAIL PROTECTED]> wrote:
>>> Hi,
>>>    I am very new to Flume and we are hoping to use it for our log
>>> aggregation into HDFS. I have a few questions below:
>>>
>>> FileChannel will double our disk IO, which will affect IO performance on
>>> certain performance sensitive machines. Hence, I was hoping to write a
>>> custom Flume source which will use a memory channel, and which will
>>> perform
>>> checkpointing. The checkpoint will be updated each time we perform a
>>> successive insertion into the memory channel. (I realize that this
>>> results
>>> in a risk of data, the maximum size of which is the capacity of the
>>> memory
>>> channel).
>>>
>>>    As long as there is capacity in the memory channel buffers, does the
>>> memory channel guarantee delivery to a sink (does it wait for
>>> acknowledgements, and retry failed packets)? This would mean that we need
>>> to
>>> ensure that we do not exceed the channel capacity.
>>>
>>> I am writing a custom source which will use the memory channel, and which
>>> will catch a ChannelException to identify any channel capacity issues(so,
>>> buffer used in the memory channel is full because of lagging
>>> sinks/network
>>> issues etc). Is that a reasonable assumption to make?
>>>
>>> Thanks,
>>> ~Rahul.
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce -
>> http://incubator.apache.org/mrunit/
>>
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>
>

--
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB