Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Problem Events


Copy link to this message
-
Re: Problem Events
Anat Rozenzon 2013-08-01, 09:42
The message is already in the channel.
Is there a way to write an interceptor to work after the channel? or before
the sink?

The only thing I found is to stop everything and delete the channel files,
but I won't be able to use this approach in production :-(
On Thu, Aug 1, 2013 at 11:13 AM, Ashish <[EMAIL PROTECTED]> wrote:

>
>
>
> On Thu, Aug 1, 2013 at 1:29 PM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> I'm having the same problem with HDFS sink.
>>
>> A 'poison' message which doesn't have timestamp header in it as the sink
>> expects.
>> This causes a NPE which ends in returning the message to the channel ,
>> over and over again.
>>
>> Is my only option to re-write the HDFS sink?
>> Isn't there any way to intercept in the sink work?
>>
>
> You can write a custom interceptor and remove/modify the poison message.
>
> Interceptors are called before message makes it way into the channel.
>
> http://flume.apache.org/FlumeUserGuide.html#flume-interceptors
>
> I wrote a blog about it a while back
> http://www.ashishpaliwal.com/blog/2013/06/flume-cookbook-implementing-custom-interceptors/
>
>
>
>>
>> Thanks
>> Anat
>>
>>
>> On Fri, Jul 26, 2013 at 3:35 AM, Arvind Prabhakar <[EMAIL PROTECTED]>wrote:
>>
>>> Sounds like a bug in ElasticSearch sink to me. Do you mind filing a Jira
>>> to track this? Sample data to cause this would be even better.
>>>
>>> Regards,
>>> Arvind Prabhakar
>>>
>>>
>>> On Thu, Jul 25, 2013 at 9:50 AM, Jeremy Karlson <[EMAIL PROTECTED]
>>> > wrote:
>>>
>>>> This was using the provided ElasticSearch sink.  The logs were not
>>>> helpful.  I ran it through with the debugger and found the source of the
>>>> problem.
>>>>
>>>> ContentBuilderUtil uses a very "aggressive" method to determine if the
>>>> content is JSON; if it contains a "{" anywhere in it, it's considered JSON.
>>>>  My body contained that but wasn't JSON, causing the JSON parser to throw a
>>>> CharConversionException from addComplexField(...) (but not the expected
>>>> JSONException).  We've changed addComplexField(...) to catch different
>>>> types of exceptions and fall back to treating it as a simple field.  We'll
>>>> probably submit a patch for this soon.
>>>>
>>>> I'm reasonably happy with this, but I still think that in the bigger
>>>> picture there should be some sort of mechanism to automatically detect and
>>>> toss / skip / flag problematic events without them plugging up the flow.
>>>>
>>>> -- Jeremy
>>>>
>>>>
>>>> On Wed, Jul 24, 2013 at 7:51 PM, Arvind Prabhakar <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Jeremy, would it be possible for you to show us logs for the part
>>>>> where the sink fails to remove an event from the channel? I am assuming
>>>>> this is a standard sink that Flume provides and not a custom one.
>>>>>
>>>>> The reason I ask is because sinks do not introspect the event, and
>>>>> hence there is no reason why it will fail during the event's removal. It is
>>>>> more likely that there is a problem within the channel in that it cannot
>>>>> dereference the event correctly. Looking at the logs will help us identify
>>>>> the root cause for what you are experiencing.
>>>>>
>>>>> Regards,
>>>>> Arvind Prabhakar
>>>>>
>>>>>
>>>>> On Wed, Jul 24, 2013 at 3:56 PM, Jeremy Karlson <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Both reasonable suggestions.  What would a custom sink look like in
>>>>>> this case, and how would I only eliminate the problem events since I don't
>>>>>> know what they are until they are attempted by the "real" sink?
>>>>>>
>>>>>> My philosophical concern (in general) is that we're taking the
>>>>>> approach of exhaustively finding and eliminating possible failure cases.
>>>>>>  It's not possible to eliminate every single failure case, so shouldn't
>>>>>> there be a method of last resort to eliminate problem events from the
>>>>>> channel?
>>>>>>
>>>>>> -- Jeremy
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 24, 2013 at 3:45 PM, Hari Shreedharan <
>>>>>> [EMAIL PROTECTED]> wrote: