Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Review Request 14700: FLUME-2119 -- duplicate files cause flume to enter irrecoverable state


Copy link to this message
-
Re: Review Request 14700: FLUME-2119 -- duplicate files cause flume to enter irrecoverable state

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14700/#review27103
-----------------------------------------------------------
Phil,

The patch generally looks good. One thing I do not agree with is continuing when the file was modified after we completed reading. This can lead to silent data loss and is going to end up causing problems. Also, with regards to the renamed file existing - how about just appending a number to the file name (like what a browser does when downloading several files of the same name - simple add a number to the file name, and keep incrementing till we finally succeed).
flume-ng-core/src/main/java/org/apache/flume/client/avro/ReliableSpoolingFileEventReader.java
<https://reviews.apache.org/r/14700/#comment52762>

    Why change this? Since both files are the same, does this matter?
- Hari Shreedharan
On Oct. 16, 2013, 10:53 p.m., Phil Scala wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14700/
> -----------------------------------------------------------
>
> (Updated Oct. 16, 2013, 10:53 p.m.)
>
>
> Review request for Flume.
>
>
> Bugs: FLUME-2119
>     https://issues.apache.org/jira/browse/FLUME-2119
>
>
> Repository: flume-git
>
>
> Description
> -------
>
> Added a new configuration setting "useStrictSpooledFilePolicies" to control when the ReliableSpoolingFileEventReader thorws and IllegalStateException.  This is so that if someone understands what they are doing a spooled file can be safely spooled more than 1 time.  This will also control cases where if the file being read is updated with a new modified date/time similarly an exception is not thrown.  
>
> the default for this setting is "true" which means the code acts in the same way today, honoring the spool directory requirements as documented.  A user will have to add the setting and set it to false in order to have this take effect.
>
> As mentioned in JIRA, Flume v1.5 does not spiral when a duplicate file is spooled, however it will "shutdown" requiring human intervention.
>
>
> Diffs
> -----
>
>   flume-ng-core/src/main/java/org/apache/flume/client/avro/ReliableSpoolingFileEventReader.java bd684ed
>   flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java 72c4059
>   flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java 7bfb0ee
>   flume-ng-core/src/test/java/org/apache/flume/client/avro/TestReliableSpoolingFileEventReader.java 9d708c1
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 98859ce
>
> Diff: https://reviews.apache.org/r/14700/diff/
>
>
> Testing
> -------
>
> 4 new unit tests added, existing unit tests in TestReliableSpoolingFileEventReader not affected.  Manually executed some tests on a windows machine spooling the same named file (different contents) to ensure no side effects.
>
>
> Thanks,
>
> Phil Scala
>
>