Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Problem Events


Copy link to this message
-
Re: Problem Events
Arvind Prabhakar 2013-08-01, 22:25
This sounds like a critical problem that can cause pipelines to block
permanently. If you find yourself in this situation, a possible work around
would be to decommission the channel, remove its data and route the flow
with a new empty channel. If you have the ability to identify which
component is causing the problem and see if you can remove it temporarily
to let the problem events pass through another peer component.

I have also created FLUME-2140 [1] which will eventually allow the
pipelines to identify and divert such bad events. If you have any logs,
data, configurations that can be shared and will help provide more details
for this problem, it will be great if you could attach them to this jira
and provide your comments.

[1] https://issues.apache.org/jira/browse/FLUME-2140

Regards,
Arvind Prabhakar

On Thu, Aug 1, 2013 at 10:33 AM, Paul Chavez <
[EMAIL PROTECTED]> wrote:

> **
> There's no way to deal with a bad event once it's in the channel, but you
> can mitigate future issues by having a timestamp interceptor bound to the
> source feeding the channel. There is a parameter 'preserve existing' that
> will only add the header if it doesn't exist. If you don't want to have
> 'bad' time data in there you could try a static interceptor with a specific
> past date so that corrupt events fall into a deterministic path in HDFS.
>
> I use this technique to prevent stuck events for both timestamp headers as
> well as some of our own custom headers we use for tokenized paths. The
> static interceptor will insert an arbitrary header if it doesn't exist so I
> have a couple that put in the value 'Unknown' so that I can still send the
> events through the HDFS sink but I can also find them later if need be.
>
> hope that helps,
> Paul Chavez
>
>  ------------------------------
> *From:* Roshan Naik [mailto:[EMAIL PROTECTED]]
> *Sent:* Thursday, August 01, 2013 10:27 AM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Problem Events
>
>  some questions:
> - why is the sink unable to consume the event ?
> - how would you like to identify such an event ? by examining its content
> ? or by the fact that its ping-pong-ing between channel and sink ?
> - what would you prefer to do with such an event ? merely drop it ?
>
>
> On Thu, Aug 1, 2013 at 9:26 AM, Jeremy Karlson <[EMAIL PROTECTED]>wrote:
>
>>  To my knowledge (which is admittedly limited), there is no way to deal
>> with these in a way that will make your day.  I'm happy if someone can say
>> otherwise.
>>
>> This is very similar to a problem I had a week or two ago.  I fixed it by
>> restarting Flume with debugging on, connecting to it with the debugger, and
>> finding the message in the sink.  Discover a bug in the sink.  Downloaded
>> Flume, fixed bug, recompiled, installed custom version, etc.
>>
>> I agree that this is not a practical solution, and I still believe that
>> Flume needs some sort of "sink of last resort" option or something, like
>> JMS implementations.
>>
>> -- Jeremy
>>
>>
>>
>> On Thu, Aug 1, 2013 at 2:42 AM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:
>>
>>>  The message is already in the channel.
>>> Is there a way to write an interceptor to work after the channel? or
>>> before the sink?
>>>
>>> The only thing I found is to stop everything and delete the channel
>>> files, but I won't be able to use this approach in production :-(
>>>
>>>
>>> On Thu, Aug 1, 2013 at 11:13 AM, Ashish <[EMAIL PROTECTED]> wrote:
>>>
>>>>
>>>>
>>>>
>>>>  On Thu, Aug 1, 2013 at 1:29 PM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:
>>>>
>>>>>   Hi,
>>>>>
>>>>> I'm having the same problem with HDFS sink.
>>>>>
>>>>> A 'poison' message which doesn't have timestamp header in it as the
>>>>> sink expects.
>>>>> This causes a NPE which ends in returning the message to the channel ,
>>>>> over and over again.
>>>>>
>>>>> Is my only option to re-write the HDFS sink?
>>>>> Isn't there any way to intercept in the sink work?
>>>>>
>>>>
>>>> You can write a custom interceptor and remove/modify the poison message.