Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Problem Events


Copy link to this message
-
Re: Problem Events
Just some more thoughts. It could be even easier:

The whole set up might be even easier; for the failover sink processor, you
have those settings "max attempts" and "time between attempts" and it will
just try one event X times before it gives up and sends it to the next
sink. The time between events could even backoff if needed.

The advantage of this is that it preserves the ordering of events,
something which gets completely broken in the previous scenario.

- Connor
On Fri, Aug 2, 2013 at 6:27 PM, Connor Woodson <[EMAIL PROTECTED]>wrote:

> As another option to solve the problem of having a bad event in a channel:
> using a fail-over sink processor, log all bad events to a local file. And
> to be extra cautious, add a third failover of a null sink. This will mean
> that events will always flow through your channel. The file sink should
> almost never fail, so you shouldn't be losing events in the process. And
> then you can re-process everything in the file if you still want those
> events for something.
>
> For the system of having Flume detect bad events, I think implementing
> something like above is better than discarding events that fail X times.
> For instance, if you have an Avro sink -> Avro source, and you're
> restarting your source, Flume would end up discarding events unnecessarily.
> Instead, how about implementing the above system and then go a step
> further: Flume will attempt to re-send the bad events itself. And then if a
> bad event isn't able to be sent after X attempts, it is can be discarded.
>
> I envision this system as an extension to the current File Channel; when
> an event fails, it is written to a secondary File Channel from which events
> can be pulled when the main channel isn't in use. It would add headers like
> "lastAttempt" and "numberOfAttempts" to events. Then it can be configurable
> for a "min time between attempts" and "maximum attempts." When an event
> fails the second time, those headers are updated and it goes back into the
> fail-channel. If it comes out of the fail-channel but the lastAttempt is
> too recent, it goes back in. If it fails more times than the maximum, it is
> written to a final location (perhaps its just sent to another sink; maybe
> this would have to be in a sink processor). Assuming all of those steps are
> error-free, then all messages are preserved, and the badly-formatted
> eventually get stored somewhere else. (This system could be hacked together
> with current code - fail over sink processor -> avro sink -> avro source on
> same instance, but that's a little too hacky).
>
> Just some thoughts.
>
> - Connor
>
>
>
>
> On Thu, Aug 1, 2013 at 3:25 PM, Arvind Prabhakar <[EMAIL PROTECTED]>wrote:
>
>> This sounds like a critical problem that can cause pipelines to block
>> permanently. If you find yourself in this situation, a possible work around
>> would be to decommission the channel, remove its data and route the flow
>> with a new empty channel. If you have the ability to identify which
>> component is causing the problem and see if you can remove it temporarily
>> to let the problem events pass through another peer component.
>>
>> I have also created FLUME-2140 [1] which will eventually allow the
>> pipelines to identify and divert such bad events. If you have any logs,
>> data, configurations that can be shared and will help provide more details
>> for this problem, it will be great if you could attach them to this jira
>> and provide your comments.
>>
>> [1] https://issues.apache.org/jira/browse/FLUME-2140
>>
>> Regards,
>> Arvind Prabhakar
>>
>> On Thu, Aug 1, 2013 at 10:33 AM, Paul Chavez <
>> [EMAIL PROTECTED]> wrote:
>>
>>> **
>>> There's no way to deal with a bad event once it's in the channel, but
>>> you can mitigate future issues by having a timestamp interceptor bound to
>>> the source feeding the channel. There is a parameter 'preserve existing'
>>> that will only add the header if it doesn't exist. If you don't want to
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB