Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Design problem while monitoring Flume


Copy link to this message
-
Re: Design problem while monitoring Flume
Israel Ekpo 2013-08-28, 14:00
Anat,

Once the event is written to the Sink you cannot intercept it any longer
unless that sink is marking the beginning of another source in a
multi-agent flow.

In the Flume architecture, the interceptors sit between the Source and the
Channel, so their role is to modify or eliminate any event coming from the
source and going to the channel.

As suggested in the previous post, right now, I think the simplest way to
gather these statistics will be to set up a custom sink or use the
ElasticSearchSink to store the events for analysis.

ElasticSearch has a variety of tools that will simplify the stats portion
of your work

http://www.elasticsearch.org/overview/

*Author and Instructor for the Upcoming Book and Lecture Series*
*Massive Log Data Aggregation, Processing, Searching and Visualization with
Open Source Software*
*http://massivelogdata.com*
On 28 August 2013 08:59, Anat Rozenzon <[EMAIL PROTECTED]> wrote:

> Thank you for the quick answer.
>
> How can I process events after they have been written? is there any
> post-write interceptor I can code?
>
>
> On Wed, Aug 28, 2013 at 11:45 AM, Juhani Connolly <
> [EMAIL PROTECTED]> wrote:
>
>> The most common cause of resending events from the source would be
>> failure to write to the channel. Most of the time this would be because the
>> channel is full.
>>
>> An approach to collecting statistics will vary on what exactly you want
>> to do, but perhaps you could write metadata to headers in the interceptor
>> and than batch process the serialized headers after events have actually
>> been written. Or if you need to be realtime you can replicate events to an
>> additional path which leads to a custom sink that collects statistics. So
>> long as the sink doesn't "bounce" events(rollback transactions) it
>> shouldn't get any events resent.
>>
>> One thing to keep in mind though is that flume in general only guarantees
>> delivery, it doesn't guarantee that stuff will only be delivered
>> once(though many components do only deliver once)
>>
>>
>> On 08/28/2013 04:09 PM, Anat Rozenzon wrote:
>>
>>> Hi,
>>>
>>> I want to get some statistics out of Flume (For example, how many
>>> records were collected, How many files etc.).
>>> I've written my own interceptor that updates an MBean whenever records
>>> arrive.
>>>
>>> I've also written a MonitorServices that collects the data from the
>>> MBean every X minutes and send it to a database.
>>>
>>> My problem is that sometimes events are resent again from the source, I
>>> saw that while debugging.
>>> Not sure why... maybe because of a timeout while sending to the sink?
>>>
>>> Anyway, if this happens in production it will corrupt my statistics.
>>>
>>> Is there any way I can know that an event have failed reaching the sink
>>> eventhough it passed the interceptor?
>>> Is there a better place to collect such statistics than an interceptor?
>>>
>>> Thanks
>>> Anat
>>>
>>
>>
>