Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Problem Events


Copy link to this message
-
Re: Problem Events
That's what I'll do, add a static timestamp of 1 and let all the 'bad'
messages flow into one directoty.
Thanks
On Wed, Aug 7, 2013 at 5:14 PM, Jonathan Cooper-Ellis <[EMAIL PROTECTED]>wrote:

> You can use a Static Interceptor before the RegexExtractor to add a
> timestamp of zero to the header, which can then be overwritten by the
> proper timestamp (if it exists). It also should sink misses into an obvious
> 'miss' directory.
>
>
> On Tue, Aug 6, 2013 at 10:40 PM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:
>
>> After some reading in the docs I think the existing fail-over behavior
>> can't be used to solve the 'poison' message problem as it put the 'failed'
>> sink in  a 'cooldown' period.
>> As the problem is in the message and not the sink, it means that after a
>> poison message had arrived, the HDFS sink will 'fail' and thus next X
>> messages will go to the failover sink.
>> My only solution for now is to avoid my current problem and hope that I
>> won't have any other problematic messages, I'll be glad to have a less
>> fragile solution.
>>
>> Many thanks!
>> Other than that, Flume looks like a great tool :-)
>>
>> Anat
>>
>>
>> On Sun, Aug 4, 2013 at 8:45 AM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:
>>
>>> I think using a fail-over processor is a very good idea, I think I'll
>>> use it as an immediate solution.
>>> For the long run, I would like to see a general solution (not specific
>>> to file channel, in my case it is an HDFS channel), so the suggestion to
>>> add 'Poison Message' sink to the sink processor sound good.
>>>
>>> Just FYI, my problem is that a log file going through my source did not
>>> have (in all rows) the structure I expected.
>>>
>>> Since I used regexp extractor to put timestamp, the 'bad' row didn't
>>> match the regexp and the timestamp was not set, then the HDFS sink throws
>>> NPE on that:
>>> 01 Aug 2013 09:36:24,259 ERROR
>>> [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:422)  - process failed
>>> java.lang.NullPointerException: Expected timestamp in the Flume event
>>> headers, but it was null
>>>         at
>>> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
>>>         at
>>> org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:200)
>>>         at
>>> org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:396)
>>>         at
>>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:356)
>>>         at
>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>         at
>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>         at java.lang.Thread.run(Thread.java:722)
>>> 01 Aug 2013 09:36:24,262 ERROR
>>> [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>> (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver
>>> event. Exception follows.
>>> org.apache.flume.EventDeliveryException: java.lang.NullPointerException:
>>> Expected timestamp in the Flume event headers, but it was null
>>>         at
>>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
>>>         at
>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>         at
>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>         at java.lang.Thread.run(Thread.java:722)
>>> Caused by: java.lang.NullPointerException: Expected timestamp in the
>>> Flume event headers, but it was null
>>>         at
>>> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
>>>         at
>>> org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:200)
>>>         at
>>> org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:396)
>>>         at
>>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:356)
>>>         ... 3 more
>>>
>>>
>>> I fixed my regexp now, still, I can never be sure all the log files the