Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Problem Events


Copy link to this message
-
Re: Problem Events
Hari Shreedharan 2013-08-07, 16:50
Can you try setting this config param for your HDFS Sink:  
hdfs.useLocalTimeStamp = true

This should insert the timestamp at the sink into the event (this may not be what you want - but this will get rid of the event from the channel).
Thanks,
Hari
On Wednesday, August 7, 2013 at 7:14 AM, Jonathan Cooper-Ellis wrote:

> You can use a Static Interceptor before the RegexExtractor to add a timestamp of zero to the header, which can then be overwritten by the proper timestamp (if it exists). It also should sink misses into an obvious 'miss' directory.
>
>
> On Tue, Aug 6, 2013 at 10:40 PM, Anat Rozenzon <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > After some reading in the docs I think the existing fail-over behavior can't be used to solve the 'poison' message problem as it put the 'failed' sink in  a 'cooldown' period.
> > As the problem is in the message and not the sink, it means that after a poison message had arrived, the HDFS sink will 'fail' and thus next X messages will go to the failover sink.
> > My only solution for now is to avoid my current problem and hope that I won't have any other problematic messages, I'll be glad to have a less fragile solution.
> >
> > Many thanks!
> > Other than that, Flume looks like a great tool :-)
> >
> > Anat
> >
> >
> > On Sun, Aug 4, 2013 at 8:45 AM, Anat Rozenzon <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > > I think using a fail-over processor is a very good idea, I think I'll use it as an immediate solution.
> > > For the long run, I would like to see a general solution (not specific to file channel, in my case it is an HDFS channel), so the suggestion to add 'Poison Message' sink to the sink processor sound good.
> > >
> > > Just FYI, my problem is that a log file going through my source did not have (in all rows) the structure I expected.
> > >
> > > Since I used regexp extractor to put timestamp, the 'bad' row didn't match the regexp and the timestamp was not set, then the HDFS sink throws NPE on that:
> > > 01 Aug 2013 09:36:24,259 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:422)  - process failed
> > > java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
> > >         at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
> > >         at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:200)
> > >         at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:396)
> > >         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:356)
> > >         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> > >         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> > >         at java.lang.Thread.run(Thread.java:722)
> > > 01 Aug 2013 09:36:24,262 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver event. Exception follows.
> > > org.apache.flume.EventDeliveryException: java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
> > >         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
> > >         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> > >         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> > >         at java.lang.Thread.run(Thread.java:722)
> > > Caused by: java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
> > >         at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
> > >         at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:200)
> > >         at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:396)
> > >         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:356)
hat's a little too hacky).