Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # dev >> spooldir source reading Flume itself and thinking the file has changed (1.3.1)


+
Edward Sargisson 2013-05-08, 16:04
+
Mike Percy 2013-05-08, 17:23
+
Edward Sargisson 2013-05-11, 00:02
+
Mike Percy 2013-05-11, 01:08
+
Edward Sargisson 2013-05-13, 15:45
+
Edward Sargisson 2013-05-14, 23:50
+
Mike Percy 2013-05-22, 07:18
+
Edward Sargisson 2013-05-22, 16:22
Copy link to this message
-
RE: spooldir source reading Flume itself and thinking the file has changed (1.3.1)
Hey Ed / Mike

Besides a comment in the users mailing list that I commented on the file spool starting from the beginning of the file if there was a failure.  The code does have that well commented (in the retireCurrentFile) [if you don't retire the file then you run the risk of duplicates...fine with my use :)]
As Ed mentioned we have been chatting about ensuring there are no invariants muddled up during file spool processing.  I see this as 2 or 3 pieces...I think the code is pretty solid, with one area I want to look into.

I would like to give this more thought...

The file the spool source has decided is the "next file"... is it in use/has the "upload" to the spool directory completed.

Discussions mentioned some "time" delay -> that could be artificial and still never solve the problem.   I need to do some learning here, coming from windows the file locking was pretty exclusive.  I want to see about FileChannel locks in nio and Linux file management.    This could maybe be an area to look at.  Right now there are no locks obtained for the file being processed.

I will come back with something a little better formulated soon...

Thanks
Phil Scala
Software Developer / Architect
Global Relay

[EMAIL PROTECTED]

866.484.6630  |  [EMAIL PROTECTED]  |  globalrelay.com 

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Edward Sargisson
Sent: Wednesday, May 22, 2013 12:22 PM
To: Mike Percy; [EMAIL PROTECTED]
Subject: Re: spooldir source reading Flume itself and thinking the file has changed (1.3.1)

Hi Mike,
I haven't tried log4j2 in my environments but my review of the log4j2 change is that it should work.

What would I change?
Phil Scala may have some thoughts.

It would be nice if we thought through the file locking. I want to be able to put a file in the spooldir and know that Flume isn't going to get started until I'm ready. This certainly involves thinking about what the file-putting process is doing but it's not clear to me how to ensure this whole part is safe.

The thing that is currently annoying is handling stack traces. All logging systems I've seen (except recent log4j2) output the stack trace with each frame on a new line. This means that each frame gets its own log event and the timestamp has to be added by Flume (instead of taken from the original event). That Flume timestamp might be delayed by up to 1 minute (because of log rolling so its pretty crap). Logstash has a multiline filter that somewhat solves this.

My current approach is to try and get the Log4j2 FlumeAppender and Flume
1.3.1 reliable and trustworthy.

Cheers,
Edward

"Hi Edward,
Did the fixes in LOG4J2-254 fix your file rolling issue?

What are your thoughts on how to improve spooling directory source's error handling when it detects a change in the file? Just bail and retry later? I suppose that's a pretty reasonable approach.

Regards,
Mike
On Tue, May 14, 2013 at 4:50 PM, Edward Sargisson <[EMAIL PROTECTED]> wrote:

> Unless I'm mistaken (and concurrent code is easy to be mistaken about)
this
> is a race condition in apache-log4j-extras RollingFileAppender. I live
> in hope that when log4j2 becomes GA we can move to it and then be able
> to use it to log Flume itself.
>
> Evidence:
> File: castellan-reader.
20130514T2058.log.COMPLETED
> 2013-05-14 20:57:05,330  INFO ...
>
> File: castellan-reader.20130514T2058.log
> 2013-05-14 21:23:05,709 DEBUG ...
>
> Why would an event from 2123 be written into a file from 2058?
>
> My understanding of log4j shows that the RollingFileAppenders end up
> calling this:
> FileAppender:
> public  synchronized  void setFile(String fileName, boolean append,
boolean
> bufferedIO, int bufferSize)
>
> Which shortly calls:
> this.qw = new QuietWriter(writer, errorHandler);
>
> However, the code to actually write to the writer is this:
> protected
>   void subAppend(LoggingEvent event) {
>     this.qw.write(this.layout.format(event));
>
> Unless I'm mistaken there's no happens-before edge between setting the
spooled
is
http://svn.apache.org/viewvc/logging/log4j/companions/extras/trunk/src/main/java/org/apache/log4j/rolling/helper/FileRenameAction.java?view=markup
https://github.com/qos-ch/logback/blob/master/logback-core/src/main/java/ch/qos/logback/core/rolling/helper/RenameUtil.java
https://svn.apache.org/repos/asf/logging/log4j/log4j2/trunk/core/src/main/java/org/apache/logging/log4j/core/appender/rolling/helper/FileRenameAction.java
the
to
/opt/castellan/log/spool/castellan-reader.20130430T2055.log.COMPLETED
org.apache.flume.client.avro.SpoolingFileLineReader.retireCurrentFile(SpoolingFileLineRe
java.io.BufferedReader.ensureOpen(BufferedReader.java:115)
org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:180)
+
Mike Percy 2013-05-23, 01:13