Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Review Request: FLUME-1425: Create Standalone Spooling Client

Copy link to this message
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and Client

This is an automatically generated e-mail. To reply, visit:

(Updated Oct. 22, 2012, 8:36 p.m.)
Review request for Flume.

This patch addresses several small issues from Mike's last review, and two big ones:

1) All Throwables are caught in the spool source run thread, so the thread pool will continue to launch new attempts if there is an exception within one execution.
2) The source now logs an error and stops making progress if a line is found which exceeds the maximum line length. This means that users will not use data if they exceed the maximum line length, but it also means flume will be unavailable until they fix the issue (it will keep repeating the error).
3) I changed the code which gets the filename associated with a particular readLines() call because it was not correct in certain cases where the readLines() call itself forces a file roll. I added a unit test for this.

This patch adds a spooling directory based source. The  idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code.

This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process).

Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner.
This addresses bug FlUME-1425.
Diffs (updated)

  flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7
  flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c
  flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae
  flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION
  flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION
  flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION
  flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION
  flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION
  flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION
  flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION
  flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION
  flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670

Diff: https://reviews.apache.org/r/6377/diff/

Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this.

Patrick Wendell