Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # dev >> spooldir source reading Flume itself and thinking the file has changed (1.3.1)


+
Edward Sargisson 2013-05-08, 16:04
+
Mike Percy 2013-05-08, 17:23
+
Edward Sargisson 2013-05-11, 00:02
+
Mike Percy 2013-05-11, 01:08
+
Edward Sargisson 2013-05-13, 15:45
Copy link to this message
-
Re: spooldir source reading Flume itself and thinking the file has changed (1.3.1)
Unless I'm mistaken (and concurrent code is easy to be mistaken about) this
is a race condition in apache-log4j-extras RollingFileAppender. I live in
hope that when log4j2 becomes GA we can move to it and then be able to use
it to log Flume itself.

Evidence:
File: castellan-reader.20130514T2058.log.COMPLETED
2013-05-14 20:57:05,330  INFO ...

File: castellan-reader.20130514T2058.log
2013-05-14 21:23:05,709 DEBUG ...

Why would an event from 2123 be written into a file from 2058?

My understanding of log4j shows that the RollingFileAppenders end up
calling this:
FileAppender:
public  synchronized  void setFile(String fileName, boolean append, boolean
bufferedIO, int bufferSize)

Which shortly calls:
this.qw = new QuietWriter(writer, errorHandler);

However, the code to actually write to the writer is this:
protected
  void subAppend(LoggingEvent event) {
    this.qw.write(this.layout.format(event));

Unless I'm mistaken there's no happens-before edge between setting the qw
and calling subappend. The code path to get to subAppend appears not to go
through any method synchronized on FileAppender's monitor. this.qw is not
volatile.

Oh, and based on my cursory inspection of the log4j2 code this exists in
log4j2 as well. I've just raised log4j2-254 to cover it. We'll see if I'm
actually right...

Cheers,
Edward
On Mon, May 13, 2013 at 8:45 AM, Edward Sargisson <[EMAIL PROTECTED]> wrote:

> Hi Mike,
> Based on my reading of the various logging frameworks' source code and the
> Java documentation I come to the conclusion that relying on an atomic move
> is not wise. (Next time I see this I might try and prove that the spooled
> file is incomplete).
>
> So I suggest two things:
> 1) A breach of that check should not cause the entire Flume instance to
> stop passing traffic.
> 2) A configurable wait time might work. If you're using the spooling
> source then you've already decided to have some latency so a little more is
> fine. However, there is still a risk of a race condition because there is
> no signal that the copy is finished.
>
> Cheers,
> Edward
>
> "Hi Edward,
> Thanks for investigating. I'm definitely open to suggestions for
> improvement with this. Maybe dying is a bit extreme… the goal was to ensure
> that people could not possibly try to use it to tail a file, which will
> definitely not work correctly! :)
>
> Mike
>
>
>
> On Fri, May 10, 2013 at 5:02 PM, Edward Sargisson <[EMAIL PROTECTED]>
> wrote:
>
> > Hi Mike,
> > I was curious so I went on a bit of a hunt through logger source code.
> > The result is that loggers can't be relied on to atomically roll the
> > file so a feature to allow a delay before checking the file would be
> > of great utility.
> >
> > For that matter, having Flume not die completely in this scenario
> > would also be good.
> >
> > apache-log4j-extras does this [1]:
> > return source.renameTo(destination);
> >
> > logback does this [2]:
> > boolean result = srcFile.renameTo(targetFile);
> >
> > log4j2 does this [3]:
> >  srcStream = new FileInputStream(source);
> >             destStream = new FileOutputStream(destination);
> >             srcChannel = srcStream.getChannel();
> >             destChannel = destStream.getChannel();
> >             destChannel.transferFrom(
> srcChannel, 0, srcChannel.size());
> >
> > The JavaDoc for File.renameTo says:
> >  Many aspects of the behavior of this method are inherently
> >  platform-dependent: The rename operation might not be able to move a
> >  file from one filesystem to another, it might not be atomic, and it
> >  might not succeed if a file with the destination abstract pathname
> >  already exists.  The return value should always be checked to make sure
> >  that the rename operation was successful.
> >
> > My conclusion is that the loggers (except possibly log4j2) can't be
> > relied on to atomically roll the file.
> >
> > Cheers,
> > Edward
> >
> >
> > Links:
> > [1]
> >
> http://svn.apache.org/viewvc/logging/log4j/companions/extras/trunk/src/main/java/org/apache/log4j/rolling/helper/FileRenameAction.java?view=markup
+
Mike Percy 2013-05-22, 07:18
+
Edward Sargisson 2013-05-22, 16:22
+
Phil Scala 2013-05-23, 00:24
+
Mike Percy 2013-05-23, 01:13
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB