Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Problem Events


Copy link to this message
-
Re: Problem Events
Jeremy Karlson 2013-07-25, 16:50
This was using the provided ElasticSearch sink.  The logs were not helpful.
 I ran it through with the debugger and found the source of the problem.

ContentBuilderUtil uses a very "aggressive" method to determine if the
content is JSON; if it contains a "{" anywhere in it, it's considered JSON.
 My body contained that but wasn't JSON, causing the JSON parser to throw a
CharConversionException from addComplexField(...) (but not the expected
JSONException).  We've changed addComplexField(...) to catch different
types of exceptions and fall back to treating it as a simple field.  We'll
probably submit a patch for this soon.

I'm reasonably happy with this, but I still think that in the bigger
picture there should be some sort of mechanism to automatically detect and
toss / skip / flag problematic events without them plugging up the flow.

-- Jeremy
On Wed, Jul 24, 2013 at 7:51 PM, Arvind Prabhakar <[EMAIL PROTECTED]> wrote:

> Jeremy, would it be possible for you to show us logs for the part where
> the sink fails to remove an event from the channel? I am assuming this is a
> standard sink that Flume provides and not a custom one.
>
> The reason I ask is because sinks do not introspect the event, and hence
> there is no reason why it will fail during the event's removal. It is more
> likely that there is a problem within the channel in that it cannot
> dereference the event correctly. Looking at the logs will help us identify
> the root cause for what you are experiencing.
>
> Regards,
> Arvind Prabhakar
>
>
> On Wed, Jul 24, 2013 at 3:56 PM, Jeremy Karlson <[EMAIL PROTECTED]>wrote:
>
>> Both reasonable suggestions.  What would a custom sink look like in this
>> case, and how would I only eliminate the problem events since I don't know
>> what they are until they are attempted by the "real" sink?
>>
>> My philosophical concern (in general) is that we're taking the approach
>> of exhaustively finding and eliminating possible failure cases.  It's not
>> possible to eliminate every single failure case, so shouldn't there be a
>> method of last resort to eliminate problem events from the channel?
>>
>> -- Jeremy
>>
>>
>>
>> On Wed, Jul 24, 2013 at 3:45 PM, Hari Shreedharan <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Or you could write a custom sink that removes this event (more work of
>>> course)
>>>
>>>
>>> Thanks,
>>> Hari
>>>
>>> On Wednesday, July 24, 2013 at 3:36 PM, Roshan Naik wrote:
>>>
>>> if you have a way to identify such events.. you may be able to use the
>>> Regex interceptor to toss them out before they get into the channel.
>>>
>>>
>>>  On Wed, Jul 24, 2013 at 2:52 PM, Jeremy Karlson <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>> Hi everyone.  My Flume adventures continue.
>>>
>>> I'm in a situation now where I have a channel that's filling because a
>>> stubborn message is stuck.  The sink won't accept it (for whatever reason;
>>> I can go into detail but that's not my point here).  This just blocks up
>>> the channel entirely, because it goes back into the channel when the sink
>>> refuses.  Obviously, this isn't ideal.
>>>
>>> I'm wondering what mechanisms, if any, Flume has to deal with these
>>> situations.  Things that come to mind might be:
>>>
>>> 1. Ditch the event after n attempts.
>>> 2. After n attempts, send the event to a "problem area" (maybe a
>>> different source / sink / channel?)  that someone can look at later.
>>> 3. Some sort of mechanism that allows operators to manually kill these
>>> messages.
>>>
>>> I'm open to suggestions on alternatives as well.
>>>
>>> Thanks.
>>>
>>> -- Jeremy
>>>
>>>
>>>
>>>
>>
>