Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Problem Events


Copy link to this message
-
Re: Problem Events
To avoid the cooldown period, set the maxBackoff to 0; the failover sink
should have some better logic regarding this, but what will happen will
take a little more processing time:

an event fails -> that sink is put on the fail over list -> the event goes
to the next sink, succeeds.
next event -> failover processor checks fail over list, removes the first
sink from that list -> event goes to first sink

I am fairly confident that's how it works.

- Connor
On Wed, Aug 7, 2013 at 10:26 PM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:

> That's what I'll do, add a static timestamp of 1 and let all the 'bad'
> messages flow into one directoty.
> Thanks
>
>
> On Wed, Aug 7, 2013 at 5:14 PM, Jonathan Cooper-Ellis <[EMAIL PROTECTED]>wrote:
>
>> You can use a Static Interceptor before the RegexExtractor to add a
>> timestamp of zero to the header, which can then be overwritten by the
>> proper timestamp (if it exists). It also should sink misses into an obvious
>> 'miss' directory.
>>
>>
>> On Tue, Aug 6, 2013 at 10:40 PM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:
>>
>>> After some reading in the docs I think the existing fail-over behavior
>>> can't be used to solve the 'poison' message problem as it put the 'failed'
>>> sink in  a 'cooldown' period.
>>> As the problem is in the message and not the sink, it means that after a
>>> poison message had arrived, the HDFS sink will 'fail' and thus next X
>>> messages will go to the failover sink.
>>> My only solution for now is to avoid my current problem and hope that I
>>> won't have any other problematic messages, I'll be glad to have a less
>>> fragile solution.
>>>
>>> Many thanks!
>>> Other than that, Flume looks like a great tool :-)
>>>
>>> Anat
>>>
>>>
>>> On Sun, Aug 4, 2013 at 8:45 AM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:
>>>
>>>> I think using a fail-over processor is a very good idea, I think I'll
>>>> use it as an immediate solution.
>>>> For the long run, I would like to see a general solution (not specific
>>>> to file channel, in my case it is an HDFS channel), so the suggestion to
>>>> add 'Poison Message' sink to the sink processor sound good.
>>>>
>>>> Just FYI, my problem is that a log file going through my source did not
>>>> have (in all rows) the structure I expected.
>>>>
>>>> Since I used regexp extractor to put timestamp, the 'bad' row didn't
>>>> match the regexp and the timestamp was not set, then the HDFS sink throws
>>>> NPE on that:
>>>> 01 Aug 2013 09:36:24,259 ERROR
>>>> [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:422)  - process failed
>>>> java.lang.NullPointerException: Expected timestamp in the Flume event
>>>> headers, but it was null
>>>>         at
>>>> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
>>>>         at
>>>> org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:200)
>>>>         at
>>>> org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:396)
>>>>         at
>>>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:356)
>>>>         at
>>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>>         at
>>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>>         at java.lang.Thread.run(Thread.java:722)
>>>> 01 Aug 2013 09:36:24,262 ERROR
>>>> [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>> (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver
>>>> event. Exception follows.
>>>> org.apache.flume.EventDeliveryException:
>>>> java.lang.NullPointerException: Expected timestamp in the Flume event
>>>> headers, but it was null
>>>>         at
>>>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
>>>>         at
>>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>>         at
>>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>>         at java.lang.Thread.run(Thread.java:722)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB