Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Problem Events


+
Jeremy Karlson 2013-07-24, 21:52
+
Roshan Naik 2013-07-24, 22:36
+
Hari Shreedharan 2013-07-24, 22:45
+
Jeremy Karlson 2013-07-24, 22:56
+
Arvind Prabhakar 2013-07-25, 02:51
+
Jeremy Karlson 2013-07-25, 16:50
+
Arvind Prabhakar 2013-07-26, 00:35
+
Anat Rozenzon 2013-08-01, 07:59
+
Ashish 2013-08-01, 08:13
+
Anat Rozenzon 2013-08-01, 09:42
+
Jeremy Karlson 2013-08-01, 16:26
+
Roshan Naik 2013-08-01, 17:26
+
Paul Chavez 2013-08-01, 17:33
+
Arvind Prabhakar 2013-08-01, 22:25
+
Connor Woodson 2013-08-03, 01:27
+
Connor Woodson 2013-08-03, 06:56
+
Anat Rozenzon 2013-08-04, 05:45
+
Anat Rozenzon 2013-08-07, 05:40
+
Jonathan Cooper-Ellis 2013-08-07, 14:14
+
Anat Rozenzon 2013-08-08, 05:26
+
Connor Woodson 2013-08-10, 01:08
Copy link to this message
-
Re: Problem Events
Can you try setting this config param for your HDFS Sink:  
hdfs.useLocalTimeStamp = true

This should insert the timestamp at the sink into the event (this may not be what you want - but this will get rid of the event from the channel).
Thanks,
Hari
On Wednesday, August 7, 2013 at 7:14 AM, Jonathan Cooper-Ellis wrote:

> You can use a Static Interceptor before the RegexExtractor to add a timestamp of zero to the header, which can then be overwritten by the proper timestamp (if it exists). It also should sink misses into an obvious 'miss' directory.
>
>
> On Tue, Aug 6, 2013 at 10:40 PM, Anat Rozenzon <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > After some reading in the docs I think the existing fail-over behavior can't be used to solve the 'poison' message problem as it put the 'failed' sink in  a 'cooldown' period.
> > As the problem is in the message and not the sink, it means that after a poison message had arrived, the HDFS sink will 'fail' and thus next X messages will go to the failover sink.
> > My only solution for now is to avoid my current problem and hope that I won't have any other problematic messages, I'll be glad to have a less fragile solution.
> >
> > Many thanks!
> > Other than that, Flume looks like a great tool :-)
> >
> > Anat
> >
> >
> > On Sun, Aug 4, 2013 at 8:45 AM, Anat Rozenzon <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > > I think using a fail-over processor is a very good idea, I think I'll use it as an immediate solution.
> > > For the long run, I would like to see a general solution (not specific to file channel, in my case it is an HDFS channel), so the suggestion to add 'Poison Message' sink to the sink processor sound good.
> > >
> > > Just FYI, my problem is that a log file going through my source did not have (in all rows) the structure I expected.
> > >
> > > Since I used regexp extractor to put timestamp, the 'bad' row didn't match the regexp and the timestamp was not set, then the HDFS sink throws NPE on that:
> > > 01 Aug 2013 09:36:24,259 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:422)  - process failed
> > > java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
> > >         at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
> > >         at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:200)
> > >         at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:396)
> > >         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:356)
> > >         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> > >         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> > >         at java.lang.Thread.run(Thread.java:722)
> > > 01 Aug 2013 09:36:24,262 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver event. Exception follows.
> > > org.apache.flume.EventDeliveryException: java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
> > >         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
> > >         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> > >         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> > >         at java.lang.Thread.run(Thread.java:722)
> > > Caused by: java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
> > >         at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
> > >         at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:200)
> > >         at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:396)
> > >         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:356)
hat's a little too hacky).
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB