Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> spooldir source reading Flume itself and thinking the file has changed (1.3.1)


Copy link to this message
-
Re: spooldir source reading Flume itself and thinking the file has changed (1.3.1)
Hi Edward,
Did the fixes in LOG4J2-254 fix your file rolling issue?

What are your thoughts on how to improve spooling directory source's error
handling when it detects a change in the file? Just bail and retry later? I
suppose that's a pretty reasonable approach.

Regards,
Mike
On Tue, May 14, 2013 at 4:50 PM, Edward Sargisson <[EMAIL PROTECTED]> wrote:

> Unless I'm mistaken (and concurrent code is easy to be mistaken about) this
> is a race condition in apache-log4j-extras RollingFileAppender. I live in
> hope that when log4j2 becomes GA we can move to it and then be able to use
> it to log Flume itself.
>
> Evidence:
> File: castellan-reader.20130514T2058.log.COMPLETED
> 2013-05-14 20:57:05,330  INFO ...
>
> File: castellan-reader.20130514T2058.log
> 2013-05-14 21:23:05,709 DEBUG ...
>
> Why would an event from 2123 be written into a file from 2058?
>
> My understanding of log4j shows that the RollingFileAppenders end up
> calling this:
> FileAppender:
> public  synchronized  void setFile(String fileName, boolean append, boolean
> bufferedIO, int bufferSize)
>
> Which shortly calls:
> this.qw = new QuietWriter(writer, errorHandler);
>
> However, the code to actually write to the writer is this:
> protected
>   void subAppend(LoggingEvent event) {
>     this.qw.write(this.layout.format(event));
>
> Unless I'm mistaken there's no happens-before edge between setting the qw
> and calling subappend. The code path to get to subAppend appears not to go
> through any method synchronized on FileAppender's monitor. this.qw is not
> volatile.
>
> Oh, and based on my cursory inspection of the log4j2 code this exists in
> log4j2 as well. I've just raised log4j2-254 to cover it. We'll see if I'm
> actually right...
>
> Cheers,
> Edward
>
>
>
>
> On Mon, May 13, 2013 at 8:45 AM, Edward Sargisson <[EMAIL PROTECTED]>
> wrote:
>
> > Hi Mike,
> > Based on my reading of the various logging frameworks' source code and
> the
> > Java documentation I come to the conclusion that relying on an atomic
> move
> > is not wise. (Next time I see this I might try and prove that the spooled
> > file is incomplete).
> >
> > So I suggest two things:
> > 1) A breach of that check should not cause the entire Flume instance to
> > stop passing traffic.
> > 2) A configurable wait time might work. If you're using the spooling
> > source then you've already decided to have some latency so a little more
> is
> > fine. However, there is still a risk of a race condition because there is
> > no signal that the copy is finished.
> >
> > Cheers,
> > Edward
> >
> > "Hi Edward,
> > Thanks for investigating. I'm definitely open to suggestions for
> > improvement with this. Maybe dying is a bit extreme… the goal was to
> ensure
> > that people could not possibly try to use it to tail a file, which will
> > definitely not work correctly! :)
> >
> > Mike
> >
> >
> >
> > On Fri, May 10, 2013 at 5:02 PM, Edward Sargisson <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi Mike,
> > > I was curious so I went on a bit of a hunt through logger source code.
> > > The result is that loggers can't be relied on to atomically roll the
> > > file so a feature to allow a delay before checking the file would be
> > > of great utility.
> > >
> > > For that matter, having Flume not die completely in this scenario
> > > would also be good.
> > >
> > > apache-log4j-extras does this [1]:
> > > return source.renameTo(destination);
> > >
> > > logback does this [2]:
> > > boolean result = srcFile.renameTo(targetFile);
> > >
> > > log4j2 does this [3]:
> > >  srcStream = new FileInputStream(source);
> > >             destStream = new FileOutputStream(destination);
> > >             srcChannel = srcStream.getChannel();
> > >             destChannel = destStream.getChannel();
> > >             destChannel.transferFrom(
> > srcChannel, 0, srcChannel.size());
> > >
> > > The JavaDoc for File.renameTo says:
> > >  Many aspects of the behavior of this method are inherently
> > >  platform-dependent: The rename operation might not be able to move a