Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Problem Events


+
Jeremy Karlson 2013-07-24, 21:52
+
Roshan Naik 2013-07-24, 22:36
+
Hari Shreedharan 2013-07-24, 22:45
+
Jeremy Karlson 2013-07-24, 22:56
+
Arvind Prabhakar 2013-07-25, 02:51
+
Jeremy Karlson 2013-07-25, 16:50
+
Arvind Prabhakar 2013-07-26, 00:35
+
Anat Rozenzon 2013-08-01, 07:59
+
Ashish 2013-08-01, 08:13
+
Anat Rozenzon 2013-08-01, 09:42
+
Jeremy Karlson 2013-08-01, 16:26
+
Roshan Naik 2013-08-01, 17:26
+
Paul Chavez 2013-08-01, 17:33
+
Arvind Prabhakar 2013-08-01, 22:25
+
Connor Woodson 2013-08-03, 01:27
+
Connor Woodson 2013-08-03, 06:56
+
Anat Rozenzon 2013-08-04, 05:45
+
Anat Rozenzon 2013-08-07, 05:40
+
Jonathan Cooper-Ellis 2013-08-07, 14:14
+
Anat Rozenzon 2013-08-08, 05:26
Copy link to this message
-
Re: Problem Events
Connor Woodson 2013-08-10, 01:08
To avoid the cooldown period, set the maxBackoff to 0; the failover sink
should have some better logic regarding this, but what will happen will
take a little more processing time:

an event fails -> that sink is put on the fail over list -> the event goes
to the next sink, succeeds.
next event -> failover processor checks fail over list, removes the first
sink from that list -> event goes to first sink

I am fairly confident that's how it works.

- Connor
On Wed, Aug 7, 2013 at 10:26 PM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:

> That's what I'll do, add a static timestamp of 1 and let all the 'bad'
> messages flow into one directoty.
> Thanks
>
>
> On Wed, Aug 7, 2013 at 5:14 PM, Jonathan Cooper-Ellis <[EMAIL PROTECTED]>wrote:
>
>> You can use a Static Interceptor before the RegexExtractor to add a
>> timestamp of zero to the header, which can then be overwritten by the
>> proper timestamp (if it exists). It also should sink misses into an obvious
>> 'miss' directory.
>>
>>
>> On Tue, Aug 6, 2013 at 10:40 PM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:
>>
>>> After some reading in the docs I think the existing fail-over behavior
>>> can't be used to solve the 'poison' message problem as it put the 'failed'
>>> sink in  a 'cooldown' period.
>>> As the problem is in the message and not the sink, it means that after a
>>> poison message had arrived, the HDFS sink will 'fail' and thus next X
>>> messages will go to the failover sink.
>>> My only solution for now is to avoid my current problem and hope that I
>>> won't have any other problematic messages, I'll be glad to have a less
>>> fragile solution.
>>>
>>> Many thanks!
>>> Other than that, Flume looks like a great tool :-)
>>>
>>> Anat
>>>
>>>
>>> On Sun, Aug 4, 2013 at 8:45 AM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:
>>>
>>>> I think using a fail-over processor is a very good idea, I think I'll
>>>> use it as an immediate solution.
>>>> For the long run, I would like to see a general solution (not specific
>>>> to file channel, in my case it is an HDFS channel), so the suggestion to
>>>> add 'Poison Message' sink to the sink processor sound good.
>>>>
>>>> Just FYI, my problem is that a log file going through my source did not
>>>> have (in all rows) the structure I expected.
>>>>
>>>> Since I used regexp extractor to put timestamp, the 'bad' row didn't
>>>> match the regexp and the timestamp was not set, then the HDFS sink throws
>>>> NPE on that:
>>>> 01 Aug 2013 09:36:24,259 ERROR
>>>> [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:422)  - process failed
>>>> java.lang.NullPointerException: Expected timestamp in the Flume event
>>>> headers, but it was null
>>>>         at
>>>> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
>>>>         at
>>>> org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:200)
>>>>         at
>>>> org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:396)
>>>>         at
>>>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:356)
>>>>         at
>>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>>         at
>>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>>         at java.lang.Thread.run(Thread.java:722)
>>>> 01 Aug 2013 09:36:24,262 ERROR
>>>> [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>> (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver
>>>> event. Exception follows.
>>>> org.apache.flume.EventDeliveryException:
>>>> java.lang.NullPointerException: Expected timestamp in the Flume event
>>>> headers, but it was null
>>>>         at
>>>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
>>>>         at
>>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>>         at
>>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>>         at java.lang.Thread.run(Thread.java:722)
+
Hari Shreedharan 2013-08-07, 16:50