Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> spooldir source reading Flume itself and thinking the file has changed (1.3.1)

Copy link to this message
Re: spooldir source reading Flume itself and thinking the file has changed (1.3.1)
Hi Mike,
I haven't tried log4j2 in my environments but my review of the log4j2
change is that it should work.

What would I change?
Phil Scala may have some thoughts.

It would be nice if we thought through the file locking. I want to be able
to put a file in the spooldir and know that Flume isn't going to get
started until I'm ready. This certainly involves thinking about what the
file-putting process is doing but it's not clear to me how to ensure this
whole part is safe.

The thing that is currently annoying is handling stack traces. All logging
systems I've seen (except recent log4j2) output the stack trace with each
frame on a new line. This means that each frame gets its own log event and
the timestamp has to be added by Flume (instead of taken from the original
event). That Flume timestamp might be delayed by up to 1 minute (because of
log rolling so its pretty crap). Logstash has a multiline filter that
somewhat solves this.

My current approach is to try and get the Log4j2 FlumeAppender and Flume
1.3.1 reliable and trustworthy.


"Hi Edward,
Did the fixes in LOG4J2-254 fix your file rolling issue?

What are your thoughts on how to improve spooling directory source's error
handling when it detects a change in the file? Just bail and retry later? I
suppose that's a pretty reasonable approach.

On Tue, May 14, 2013 at 4:50 PM, Edward Sargisson <[EMAIL PROTECTED]> wrote:

> Unless I'm mistaken (and concurrent code is easy to be mistaken about)
> is a race condition in apache-log4j-extras RollingFileAppender. I live in
> hope that when log4j2 becomes GA we can move to it and then be able to use
> it to log Flume itself.
> Evidence:
> File: castellan-reader.
> 2013-05-14 20:57:05,330  INFO ...
> File: castellan-reader.20130514T2058.log
> 2013-05-14 21:23:05,709 DEBUG ...
> Why would an event from 2123 be written into a file from 2058?
> My understanding of log4j shows that the RollingFileAppenders end up
> calling this:
> FileAppender:
> public  synchronized  void setFile(String fileName, boolean append,
> bufferedIO, int bufferSize)
> Which shortly calls:
> this.qw = new QuietWriter(writer, errorHandler);
> However, the code to actually write to the writer is this:
> protected
>   void subAppend(LoggingEvent event) {
>     this.qw.write(this.layout.format(event));
> Unless I'm mistaken there's no happens-before edge between setting the qw
> and calling subappend. The code path to get to subAppend appears not to go
> through any method synchronized on FileAppender's monitor. this.qw is not
> volatile.
> Oh, and based on my cursory inspection of the log4j2 code this exists in
> log4j2 as well. I've just raised log4j2-254 to cover it. We'll see if I'm
> actually right...
> Cheers,
> Edward
> On Mon, May 13, 2013 at 8:45 AM, Edward Sargisson <[EMAIL PROTECTED]>
> wrote:
> > Hi Mike,
> > Based on my reading of the various logging frameworks' source code and
> the
> > Java documentation I come to the conclusion that relying on an atomic
> move
> > is not wise. (Next time I see this I might try and prove that the
> > file is incomplete).
> >
> > So I suggest two things:
> > 1) A breach of that check should not cause the entire Flume instance to
> > stop passing traffic.
> > 2) A configurable wait time might work. If you're using the spooling
> > source then you've already decided to have some latency so a little more
> is
> > fine. However, there is still a risk of a race condition because there
> > no signal that the copy is finished.
> >
> > Cheers,
> > Edward
> >
> > "Hi Edward,
> > Thanks for investigating. I'm definitely open to suggestions for
> > improvement with this. Maybe dying is a bit extreme… the goal was to
> ensure
> > that people could not possibly try to use it to tail a file, which will
> > definitely not work correctly! :)
> >
> > Mike
> >
> >
> >
> > On Fri, May 10, 2013 at 5:02 PM, Edward Sargisson <[EMAIL PROTECTED]>