|
Patrick Wendell
2012-08-04, 06:47
Patrick Wendell
2012-08-04, 06:59
Jarek Cecho
2012-08-07, 07:09
Patrick Wendell
2012-08-07, 17:46
Hari Shreedharan
2012-08-07, 18:40
Patrick Wendell
2012-08-11, 05:32
Patrick Wendell
2012-08-11, 05:34
Patrick Wendell
2012-08-14, 21:54
Patrick Wendell
2012-08-14, 22:02
Mike Percy
2012-08-23, 10:11
Mike Percy
2012-10-07, 04:30
Patrick Wendell
2012-10-11, 18:02
Patrick Wendell
2012-10-11, 18:10
Mike Percy
2012-10-11, 19:30
Mike Percy
2012-10-11, 19:31
Patrick Wendell
2012-10-11, 19:47
Patrick Wendell
2012-10-12, 17:15
Cameron Gandevia
2012-10-15, 19:18
Patrick Wendell
2012-10-15, 21:11
Mike Percy
2012-10-15, 22:02
Cameron Gandevia
2012-10-16, 01:19
Mike Percy
2012-10-16, 21:49
Mike Percy
2012-10-17, 00:26
Mike Percy
2012-10-17, 02:21
Patrick Wendell
2012-10-19, 17:09
Patrick Wendell
2012-10-19, 18:48
Patrick Wendell
2012-10-19, 18:49
Mike Percy
2012-10-19, 22:31
Mike Percy
2012-10-19, 22:43
Brock Noland
2012-10-22, 14:26
Patrick Wendell
2012-10-22, 19:31
Patrick Wendell
2012-10-22, 20:36
Mike Percy
2012-10-30, 06:22
Patrick Wendell
2012-11-01, 04:47
Mike Percy
2012-11-06, 01:13
Mike Percy
2012-11-06, 02:12
Patrick Wendell
2012-11-06, 06:34
Alexander Alten-Lorenz
2012-11-06, 08:27
Alexander Alten-Lorenz
2012-11-06, 09:15
|
-
Review Request: FLUME-1425: Create Standalone Spooling ClientPatrick Wendell 2012-08-04, 06:47
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/ ----------------------------------------------------------- Review request for Flume. Description ------- This patch extends the existing avro client to support a file-based spooling mechanism. See in-line documentation for precise details, but the basic idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. I feel vaguely uneasy about building this as part of the standlone avro client rather than as its own source. An alternative would be to build this as a proper source (in fact, there are some ad-hoc transaction semantics used here which would really be a better fit for a source). Interested in hearing feedback on that as well. The benefit of having this in the avro client is that you don't need the flume runner scripts which are windows dependent. This addresses bug FlUME-1425. https://issues.apache.org/jira/browse/FlUME-1425 Diffs ----- flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION Diff: https://reviews.apache.org/r/6377/diff/ Testing ------- Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. Thanks, Patrick Wendell
-
Re: Review Request: FLUME-1425: Create Standalone Spooling ClientPatrick Wendell 2012-08-04, 06:59
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/ ----------------------------------------------------------- (Updated Aug. 4, 2012, 6:59 a.m.) Review request for Flume. Description (updated) ------- This patch extends the existing avro client to support a file-based spooling mechanism. See in-line documentation for precise details, but the basic idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. I feel vaguely uneasy about building this as part of the standlone avro client rather than as its own source. An alternative would be to build this as a proper source (in fact, there are some ad-hoc transaction semantics used here which would really be a better fit for a source). Interested in hearing feedback on that as well. The benefit of having this in the avro client is that you don't need the flume runner scripts which are not windows compatible. This addresses bug FlUME-1425. https://issues.apache.org/jira/browse/FlUME-1425 Diffs ----- flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION Diff: https://reviews.apache.org/r/6377/diff/ Testing ------- Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. Thanks, Patrick Wendell
-
Re: Review Request: FLUME-1425: Create Standalone Spooling ClientJarek Cecho 2012-08-07, 07:09
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review9950 ----------------------------------------------------------- Hi Patrik, thank you very much for your patch. I would personally prefer if it would be implemented as a source (I do understand windows incompatibility at the moment). However would be great to hear other opinions. I was thinking about use cases of such functionality and I've come with an improvement idea. Do you think that you could allow user to specify header name where flume will automatically populate original file name? For example if user specifies header name to "file" and configuration directory as /directory than flume will populate header "file" with content "cool.txt" for each line of "/directory/cool.txt" file. flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment21146> Nit: Could you please change the tab characters to spaces here? Jarcec - Jarek Cecho On Aug. 4, 2012, 6:59 a.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Aug. 4, 2012, 6:59 a.m.) > > > Review request for Flume. > > > Description > ------- > > This patch extends the existing avro client to support a file-based spooling mechanism. See in-line documentation for precise details, but the basic idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > I feel vaguely uneasy about building this as part of the standlone avro client rather than as its own source. An alternative would be to build this as a proper source (in fact, there are some ad-hoc transaction semantics used here which would really be a better fit for a source). Interested in hearing feedback on that as well. The benefit of having this in the avro client is that you don't need the flume runner scripts which are not windows compatible. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. > > > Thanks, > > Patrick Wendell > >
-
Re: Review Request: FLUME-1425: Create Standalone Spooling ClientPatrick Wendell 2012-08-07, 17:46
> On Aug. 7, 2012, 7:09 a.m., Jarek Cecho wrote: > > Hi Patrik, > > thank you very much for your patch. I would personally prefer if it would be implemented as a source (I do understand windows incompatibility at the moment). However would be great to hear other opinions. > > > > I was thinking about use cases of such functionality and I've come with an improvement idea. Do you think that you could allow user to specify header name where flume will automatically populate original file name? For example if user specifies header name to "file" and configuration directory as /directory than flume will populate header "file" with content "cool.txt" for each line of "/directory/cool.txt" file. Let me re-implement this as a source, and maybe we can find a way to also include it in the avro client. The filename suggestion is also a good idea. I'll plan to add this in the next rev of the patch. We could also add that as a feature and do in a second patch depending on how much review bandwidth you have. - Patrick ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review9950 ----------------------------------------------------------- On Aug. 4, 2012, 6:59 a.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Aug. 4, 2012, 6:59 a.m.) > > > Review request for Flume. > > > Description > ------- > > This patch extends the existing avro client to support a file-based spooling mechanism. See in-line documentation for precise details, but the basic idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > I feel vaguely uneasy about building this as part of the standlone avro client rather than as its own source. An alternative would be to build this as a proper source (in fact, there are some ad-hoc transaction semantics used here which would really be a better fit for a source). Interested in hearing feedback on that as well. The benefit of having this in the avro client is that you don't need the flume runner scripts which are not windows compatible. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. > > > Thanks, > > Patrick Wendell > >
-
Re: Review Request: FLUME-1425: Create Standalone Spooling ClientHari Shreedharan 2012-08-07, 18:40
> On Aug. 7, 2012, 7:09 a.m., Jarek Cecho wrote: > > Hi Patrik, > > thank you very much for your patch. I would personally prefer if it would be implemented as a source (I do understand windows incompatibility at the moment). However would be great to hear other opinions. > > > > I was thinking about use cases of such functionality and I've come with an improvement idea. Do you think that you could allow user to specify header name where flume will automatically populate original file name? For example if user specifies header name to "file" and configuration directory as /directory than flume will populate header "file" with content "cool.txt" for each line of "/directory/cool.txt" file. > > Patrick Wendell wrote: > Let me re-implement this as a source, and maybe we can find a way to also include it in the avro client. > > The filename suggestion is also a good idea. I'll plan to add this in the next rev of the patch. We could also add that as a feature and do in a second patch depending on how much review bandwidth you have. Patrick & Jarcec, Not to interfere in this conversation, but would it be possible if this is implemented as a library of some sort, which can be used by a source and a client? That way we could have a client which supports this feature as well. Just a suggestion, upto both of you to decide :-) Thanks! - Hari ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review9950 ----------------------------------------------------------- On Aug. 4, 2012, 6:59 a.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Aug. 4, 2012, 6:59 a.m.) > > > Review request for Flume. > > > Description > ------- > > This patch extends the existing avro client to support a file-based spooling mechanism. See in-line documentation for precise details, but the basic idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > I feel vaguely uneasy about building this as part of the standlone avro client rather than as its own source. An alternative would be to build this as a proper source (in fact, there are some ad-hoc transaction semantics used here which would really be a better fit for a source). Interested in hearing feedback on that as well. The benefit of having this in the avro client is that you don't need the flume runner scripts which are not windows compatible. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. > > > Thanks, > > Patrick Wendell > >
-
Re: Review Request: FLUME-1425: Create Standalone Spooling ClientPatrick Wendell 2012-08-11, 05:32
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/ ----------------------------------------------------------- (Updated Aug. 11, 2012, 5:32 a.m.) Review request for Flume. Changes ------- This patch substantially changes the implementation. It: 1) Does this as a source in addition a feature of the avro client, by factoring out functionality into a shared class. 2) Adds the header functionality suggested by Jarcek. 3) Adds relevant documentation in the user guide and several unit tests. It's probably easier to just review this patch rather than diff from the earlier version, given the scope and extent of the changes. Description ------- This patch extends the existing avro client to support a file-based spooling mechanism. See in-line documentation for precise details, but the basic idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. I feel vaguely uneasy about building this as part of the standlone avro client rather than as its own source. An alternative would be to build this as a proper source (in fact, there are some ad-hoc transaction semantics used here which would really be a better fit for a source). Interested in hearing feedback on that as well. The benefit of having this in the avro client is that you don't need the flume runner scripts which are not windows compatible. This addresses bug FlUME-1425. https://issues.apache.org/jira/browse/FlUME-1425 Diffs (updated) ----- flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION flume-ng-doc/sphinx/FlumeUserGuide.rst 45dd7cc Diff: https://reviews.apache.org/r/6377/diff/ Testing ------- Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. Thanks, Patrick Wendell
-
Re: Review Request: FLUME-1425: Create Standalone Spooling ClientPatrick Wendell 2012-08-11, 05:34
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review10168 ----------------------------------------------------------- flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java <https://reviews.apache.org/r/6377/#comment21530> I will add a patch fixing all this whitespace crap tomorrow, sorry about that. - Patrick Wendell On Aug. 11, 2012, 5:32 a.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Aug. 11, 2012, 5:32 a.m.) > > > Review request for Flume. > > > Description > ------- > > This patch extends the existing avro client to support a file-based spooling mechanism. See in-line documentation for precise details, but the basic idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > I feel vaguely uneasy about building this as part of the standlone avro client rather than as its own source. An alternative would be to build this as a proper source (in fact, there are some ad-hoc transaction semantics used here which would really be a better fit for a source). Interested in hearing feedback on that as well. The benefit of having this in the avro client is that you don't need the flume runner scripts which are not windows compatible. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 45dd7cc > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. > > > Thanks, > > Patrick Wendell > >
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientPatrick Wendell 2012-08-14, 21:54
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/ ----------------------------------------------------------- (Updated Aug. 14, 2012, 9:54 p.m.) Review request for Flume. Changes ------- Updating title to change scope and adding whitespace fixes. Summary (updated) ----------------- FLUME-1425: Create a SpoolDirectory Source and Client Description ------- This patch extends the existing avro client to support a file-based spooling mechanism. See in-line documentation for precise details, but the basic idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. I feel vaguely uneasy about building this as part of the standlone avro client rather than as its own source. An alternative would be to build this as a proper source (in fact, there are some ad-hoc transaction semantics used here which would really be a better fit for a source). Interested in hearing feedback on that as well. The benefit of having this in the avro client is that you don't need the flume runner scripts which are not windows compatible. This addresses bug FlUME-1425. https://issues.apache.org/jira/browse/FlUME-1425 Diffs (updated) ----- flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION flume-ng-doc/sphinx/FlumeUserGuide.rst 45dd7cc Diff: https://reviews.apache.org/r/6377/diff/ Testing ------- Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. Thanks, Patrick Wendell
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientPatrick Wendell 2012-08-14, 22:02
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/ ----------------------------------------------------------- (Updated Aug. 14, 2012, 10:02 p.m.) Review request for Flume. Changes ------- Updating description. Description (updated) ------- This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. This addresses bug FlUME-1425. https://issues.apache.org/jira/browse/FlUME-1425 Diffs ----- flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION flume-ng-doc/sphinx/FlumeUserGuide.rst 45dd7cc Diff: https://reviews.apache.org/r/6377/diff/ Testing ------- Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. Thanks, Patrick Wendell
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientMike Percy 2012-08-23, 10:11
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review10681 ----------------------------------------------------------- Overall this looks good and it is going to be a very useful addition. I had time tonight for about half of a review, will try to follow up on the rest tomorrow. I've added some suggestions below, a few of which may be a little bit nitpicky. One thing that is kind of nagging me is whether we can simplify the logic of SpoolingFileLineReader, since right now I feel it's a little bit hard to follow. Maybe I'm just tired, so I'm going to stare at it some more later. flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java <https://reviews.apache.org/r/6377/#comment22987> Nit: typo in simultaneously flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment22988> It would be safer to use File.createTempFile() for this instead of using a semi hardcoded filename. The 3-arg form accepts a directory as an argument. flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment22994> Nit: please add a @throws comment and throws clause for FlumeException (even though it's a RuntimeException) flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment22995> It is unfortunate that we have to check for this case, but it looks like there is a Windows atomicity issue with renameTo() for Case #1 (re. the comments). Maybe it would be more reliable to always throw an exception when existing.exists() occurs on a non-Windows OS. flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment22997> Any reason not to make this method idempotent? Why throw? flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment22991> What do you think of changing the contract to process the files in File.lastModified() time order, ascending? (oldest first) flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment22992> Might consider using listFiles(FileFilter) for this to avoid allocating & traversing the list twice. Would likely only become noticeable in the case of a directory with hundreds or thousands of files flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment22993> Might be better to simply throw an IllegalStateException here with an appropriate message, e.g. "Line reader has already been stopped" or something flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment22989> This inner class should be declared as private static flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment22996> In general I am wondering if we can get by without this exception type flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java <https://reviews.apache.org/r/6377/#comment22998> Can we make this a static inner class by passing shared objects to the constructor as params? - Mike Percy On Aug. 14, 2012, 10:02 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Aug. 14, 2012, 10:02 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code.
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientMike Percy 2012-10-07, 04:30
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12210 ----------------------------------------------------------- Sorry for taking so long for me to get back to this. I read through the code again, and I figured out what was throwing me off the first time. The way this is structured, there is a polling check-and-sleep mechanism built into the SpoolingFileLineReader, causing it to block, as well as a start/stop mechanism. This makes the SpoolingFileLineReader pretty complicated and, in my opinion, a bit tricky to understand. I think it would be great to refactor this code so that the looping mechanism was done via ScheduledExecutorService.scheduleWithFixedDelay at the top level of the Source and/or Client (actually, an Executor is already there in the Source), allowing the SpoolingFileLineReader to focus more on reading files and less on maintaining state. This should also remove the need for the ReaderStoppedException. Another thing to consider is error conditions. What happens if processEvents() throws a ChannelException? This can happen if the channel is full. In that case, we should back off a bit, and then retry pushing the same events on the next run. Along the same vein, we should not roll a file until we are certain all events have successfully been committed to a channel. To clarify what I'm suggesting, this is an example of how it could be refactored: SpoolDirectorySource.start: scheduledExecutorService.scheduleWithFixedDelay(new SpoolRunnable(new SpoolingFileLineReader(...)), 0L, POLL_SLEEP_MS, TimeUnit.MILLISECONDS); SpoolRunnable.run: while (true) { lines = reader.readLines(batchSize); if (lines == null) return; for (line in lines) { events.add(createEvent(line)); } try { channelProcessor.processEvents(events); reader.commit(); } catch (ChannelException ex) { /* ... */ } } SpoolingFileLineReader.readLines: If no commit since last readLines() call, then return the previous result from a saved buffer. Otherwise, read, save to a buffer, and then return lines from an available file if possible. If no files are available, return null. SpoolingFileLineReader.commit: Clear the saved buffer. If the current file is at EOF, then retire the file. Thanks for all the hard work. Let's take this to the finish line. - Mike Percy On Aug. 14, 2012, 10:02 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Aug. 14, 2012, 10:02 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientPatrick Wendell 2012-10-11, 18:02
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12334 ----------------------------------------------------------- This responds mostly to the lower level comments - not the high re-architecting comments. I am going to defer those to an in person discussion. flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26109> There is another case here that can affect even non-windows users. What if they run Flume, it sends a few files (say it sends 2 out of the 10 files in the directory) then the stop it. Then they run Flume again but they don't clean up the files that were sent. The first time flume copies one of these files that was already sent, it will try to do a rename to make the exact same file, and you could argue that Flume should give an error message and move on, rather than die. What do you think? I could also see maybe adding a check at start-up to see if there are any duplicate (file, file_renamed) pairs and failing there. Alternatively, it might be reasonable to just fail here if the duplicate exists. flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26110> changed to a log warning... might still be helpful for people to know when this is happening in case they are mis-using it. flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26118> Done - this also makes the testing way easier. flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26119> I'm deferring this change until I do the broader re-architecture, because this exception will likely be removed. flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java <https://reviews.apache.org/r/6377/#comment26120> Why make it static? A PoolDirectoryRunnable will only be created when there is a SpoolDirectorySource that is using it... right? - Patrick Wendell On Aug. 14, 2012, 10:02 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Aug. 14, 2012, 10:02 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientPatrick Wendell 2012-10-11, 18:10
> On Oct. 7, 2012, 4:30 a.m., Mike Percy wrote: > > Sorry for taking so long for me to get back to this. I read through the code again, and I figured out what was throwing me off the first time. The way this is structured, there is a polling check-and-sleep mechanism built into the SpoolingFileLineReader, causing it to block, as well as a start/stop mechanism. This makes the SpoolingFileLineReader pretty complicated and, in my opinion, a bit tricky to understand. I think it would be great to refactor this code so that the looping mechanism was done via ScheduledExecutorService.scheduleWithFixedDelay at the top level of the Source and/or Client (actually, an Executor is already there in the Source), allowing the SpoolingFileLineReader to focus more on reading files and less on maintaining state. This should also remove the need for the ReaderStoppedException. > > > > Another thing to consider is error conditions. What happens if processEvents() throws a ChannelException? This can happen if the channel is full. In that case, we should back off a bit, and then retry pushing the same events on the next run. Along the same vein, we should not roll a file until we are certain all events have successfully been committed to a channel. > > > > To clarify what I'm suggesting, this is an example of how it could be refactored: > > > > SpoolDirectorySource.start: scheduledExecutorService.scheduleWithFixedDelay(new SpoolRunnable(new SpoolingFileLineReader(...)), 0L, POLL_SLEEP_MS, TimeUnit.MILLISECONDS); > > SpoolRunnable.run: while (true) { lines = reader.readLines(batchSize); if (lines == null) return; for (line in lines) { events.add(createEvent(line)); } try { channelProcessor.processEvents(events); reader.commit(); } catch (ChannelException ex) { /* ... */ } } > > SpoolingFileLineReader.readLines: If no commit since last readLines() call, then return the previous result from a saved buffer. Otherwise, read, save to a buffer, and then return lines from an available file if possible. If no files are available, return null. > > SpoolingFileLineReader.commit: Clear the saved buffer. If the current file is at EOF, then retire the file. > > > > Thanks for all the hard work. Let's take this to the finish line. In general - I like this pattern much better than what is there now. One question though - this assumes that a given SpoolingFileLineReader will only be accessed from a single thread. If several threads are calling readLines() and commit() it will mess with the semantics of commit(). In the current model you can have multiple threads reading lines interchangeably (I think - thought I haven't fully fleshed this out in my mind). I think not being thread-safe is fine though, given the way that this hooks into a source - where it will only be accessed by one thread. - Patrick ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12210 ----------------------------------------------------------- On Aug. 14, 2012, 10:02 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Aug. 14, 2012, 10:02 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process).
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientMike Percy 2012-10-11, 19:30
> On Oct. 7, 2012, 4:30 a.m., Mike Percy wrote: > > Sorry for taking so long for me to get back to this. I read through the code again, and I figured out what was throwing me off the first time. The way this is structured, there is a polling check-and-sleep mechanism built into the SpoolingFileLineReader, causing it to block, as well as a start/stop mechanism. This makes the SpoolingFileLineReader pretty complicated and, in my opinion, a bit tricky to understand. I think it would be great to refactor this code so that the looping mechanism was done via ScheduledExecutorService.scheduleWithFixedDelay at the top level of the Source and/or Client (actually, an Executor is already there in the Source), allowing the SpoolingFileLineReader to focus more on reading files and less on maintaining state. This should also remove the need for the ReaderStoppedException. > > > > Another thing to consider is error conditions. What happens if processEvents() throws a ChannelException? This can happen if the channel is full. In that case, we should back off a bit, and then retry pushing the same events on the next run. Along the same vein, we should not roll a file until we are certain all events have successfully been committed to a channel. > > > > To clarify what I'm suggesting, this is an example of how it could be refactored: > > > > SpoolDirectorySource.start: scheduledExecutorService.scheduleWithFixedDelay(new SpoolRunnable(new SpoolingFileLineReader(...)), 0L, POLL_SLEEP_MS, TimeUnit.MILLISECONDS); > > SpoolRunnable.run: while (true) { lines = reader.readLines(batchSize); if (lines == null) return; for (line in lines) { events.add(createEvent(line)); } try { channelProcessor.processEvents(events); reader.commit(); } catch (ChannelException ex) { /* ... */ } } > > SpoolingFileLineReader.readLines: If no commit since last readLines() call, then return the previous result from a saved buffer. Otherwise, read, save to a buffer, and then return lines from an available file if possible. If no files are available, return null. > > SpoolingFileLineReader.commit: Clear the saved buffer. If the current file is at EOF, then retire the file. > > > > Thanks for all the hard work. Let's take this to the finish line. > > Patrick Wendell wrote: > In general - I like this pattern much better than what is there now. > > One question though - this assumes that a given SpoolingFileLineReader will only be accessed from a single thread. If several threads are calling readLines() and commit() it will mess with the semantics of commit(). In the current model you can have multiple threads reading lines interchangeably (I think - thought I haven't fully fleshed this out in my mind). > > I think not being thread-safe is fine though, given the way that this hooks into a source - where it will only be accessed by one thread. Agreed - as we discussed on #flume IRC, it seems reasonable to simplify this and only support a single thread reading each file at a time. This fits with the Sink threading model. - Mike ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12210 ----------------------------------------------------------- On Aug. 14, 2012, 10:02 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Aug. 14, 2012, 10:02 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code.
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientMike Percy 2012-10-11, 19:31
> On Oct. 11, 2012, 6:02 p.m., Patrick Wendell wrote: > > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java, line 109 > > <https://reviews.apache.org/r/6377/diff/3/?file=140111#file140111line109> > > > > Why make it static? A PoolDirectoryRunnable will only be created when there is a SpoolDirectorySource that is using it... right? True, as we discussed on IRC this is basically a style & maintainability issue. Since this class is very small, feel free to ignore. > On Oct. 11, 2012, 6:02 p.m., Patrick Wendell wrote: > > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, line 198 > > <https://reviews.apache.org/r/6377/diff/3/?file=140110#file140110line198> > > > > There is another case here that can affect even non-windows users. > > > > What if they run Flume, it sends a few files (say it sends 2 out of the 10 files in the directory) then the stop it. Then they run Flume again but they don't clean up the files that were sent. The first time flume copies one of these files that was already sent, it will try to do a rename to make the exact same file, and you could argue that Flume should give an error message and move on, rather than die. > > > > What do you think? I could also see maybe adding a check at start-up to see if there are any duplicate (file, file_renamed) pairs and failing there. Alternatively, it might be reasonable to just fail here if the duplicate exists. As discussed on #flume IRC, to detect Windows we can use System.getProperty("os.name") and if the user does something illegal like recreate a file with an old name on a non-Windows system then we can consider that a serious error and throw. - Mike ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12334 ----------------------------------------------------------- On Aug. 14, 2012, 10:02 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Aug. 14, 2012, 10:02 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientPatrick Wendell 2012-10-11, 19:47
> On Oct. 11, 2012, 6:02 p.m., Patrick Wendell wrote: > > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, line 198 > > <https://reviews.apache.org/r/6377/diff/3/?file=140110#file140110line198> > > > > There is another case here that can affect even non-windows users. > > > > What if they run Flume, it sends a few files (say it sends 2 out of the 10 files in the directory) then the stop it. Then they run Flume again but they don't clean up the files that were sent. The first time flume copies one of these files that was already sent, it will try to do a rename to make the exact same file, and you could argue that Flume should give an error message and move on, rather than die. > > > > What do you think? I could also see maybe adding a check at start-up to see if there are any duplicate (file, file_renamed) pairs and failing there. Alternatively, it might be reasonable to just fail here if the duplicate exists. > > Mike Percy wrote: > As discussed on #flume IRC, to detect Windows we can use System.getProperty("os.name") and if the user does something illegal like recreate a file with an old name on a non-Windows system then we can consider that a serious error and throw. Tep - took care of this. - Patrick ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12334 ----------------------------------------------------------- On Aug. 14, 2012, 10:02 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Aug. 14, 2012, 10:02 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 45dd7cc
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientPatrick Wendell 2012-10-12, 17:15
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/ ----------------------------------------------------------- (Updated Oct. 12, 2012, 5:15 p.m.) Review request for Flume. Changes ------- Several changes which address Mike's comments. Description ------- This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. This addresses bug FlUME-1425. https://issues.apache.org/jira/browse/FlUME-1425 Diffs (updated) ----- flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 Diff: https://reviews.apache.org/r/6377/diff/ Testing ------- Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. Thanks, Patrick Wendell
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientCameron Gandevia 2012-10-15, 19:18
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12449 ----------------------------------------------------------- Maybe better to address in a separate ticket but what do you think about allowing pluggable readers so that users can get things like stack traces into a single flume event. - Cameron Gandevia On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2012, 5:15 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. > > > Thanks, > > Patrick Wendell > >
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientPatrick Wendell 2012-10-15, 21:11
> On Oct. 15, 2012, 7:18 p.m., Cameron Gandevia wrote: > > Maybe better to address in a separate ticket but what do you think about allowing pluggable readers so that users can get things like stack traces into a single flume event. That's a good idea - I think it might tie into Mike's notion of serializers, which could interpret arbitrary file formats. You could have a serializer that just coalesces entries together based on something other than a newline character. - Patrick ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12449 ----------------------------------------------------------- On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2012, 5:15 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. > > > Thanks, > > Patrick Wendell > >
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientMike Percy 2012-10-15, 22:02
> On Oct. 15, 2012, 7:18 p.m., Cameron Gandevia wrote: > > Maybe better to address in a separate ticket but what do you think about allowing pluggable readers so that users can get things like stack traces into a single flume event. > > Patrick Wendell wrote: > That's a good idea - I think it might tie into Mike's notion of serializers, which could interpret arbitrary file formats. You could have a serializer that just coalesces entries together based on something other than a newline character. Cameron I'm looking into that in https://issues.apache.org/jira/browse/FLUME-1633 - Mike ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12449 ----------------------------------------------------------- On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2012, 5:15 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. > > > Thanks, > > Patrick Wendell > >
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientCameron Gandevia 2012-10-16, 01:19
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12467 ----------------------------------------------------------- flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java <https://reviews.apache.org/r/6377/#comment26513> I don't fully understand the flume life-cycle, but if this throws and the executor is stopped will it be started again? - Cameron Gandevia On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2012, 5:15 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. > > > Thanks, > > Patrick Wendell > >
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientMike Percy 2012-10-16, 21:49
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12458 ----------------------------------------------------------- One issue now is that this does not respect a max line length due to delegating to the BufferedReader. So there is the possibility of a too-large line being read but not committed to the channel, resulting in data loss because the reset() would throw an exception of java.io.IOException: Mark invalid. I think the only way around that is to write a custom BufferedReader. See also http://stackoverflow.com/questions/5960554/maximum-line-length-for-bufferedreader-readline-in-java and http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4107821 flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26452> would be better for this to return null - Mike Percy On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2012, 5:15 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. > > > Thanks, > > Patrick Wendell > >
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientMike Percy 2012-10-17, 00:26
> On Oct. 16, 2012, 1:19 a.m., Cameron Gandevia wrote: > > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java, line 150 > > <https://reviews.apache.org/r/6377/diff/4/?file=176276#file176276line150> > > > > I don't fully understand the flume life-cycle, but if this throws and the executor is stopped will it be started again? Cameron, you are right, if the RuntimeException is thrown the scheduled executor service thread will die and our task will not continue. See http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/ScheduledExecutorService.html @ scheduleWithFixedDelay() I think we will need to avoid throwing any runtime exceptions from the Runnable in this case (although we should let Errors through after logging them). So essentially we need to put this around the whole run() method: try { // ... } catch (Throwable t) { logger.error("Uncaught exception in Runnable", t); if (t instanceof Error) { throw (Error) t; } } - Mike ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12467 ----------------------------------------------------------- On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2012, 5:15 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this.
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientMike Percy 2012-10-17, 02:21
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12497 ----------------------------------------------------------- Sorry for the additional review, a few more suggestions are below. flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26593> It's a best practice to always set the causal exception as the cause in the Exception constructor. Don't get rid of the old stacktrace, prefer something like: new FlumeException("Unable to commit read of file: " + filename, originalException) However, in this case, why not just let the method throw IOException? flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26601> We should log at info level whenever this occurs, since it is not a normal condition. flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26605> Nit: can also use Collections.EMPTY_LIST flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26606> Collections.EMPTY_LIST flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26608> No need to log & throw, just throw an IllegalStateException flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26609> No need to log, just throw an IllegalStateException flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26610> This should be indexOf("win") >= 0 flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26611> No need to log, just throw an IllegalStateException flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26604> Nit: "} else {" should live on the same line as the closing and opening brace to match the style of the rest of the Flume code. flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26613> This is a very serious error, I think we should throw an exception in this case. flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26595> This exception should be logged, i.e. logger.warn("Could not find file: " + file, ex); flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26596> This exception should be logged - Mike Percy On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2012, 5:15 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process).
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientPatrick Wendell 2012-10-19, 17:09
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12605 ----------------------------------------------------------- flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26901> I get compiler type warnings when I do this - should I still go ahead? flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment26900> I get compiler type warnings when I do this - should I still go ahead? - Patrick Wendell On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2012, 5:15 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. > > > Thanks, > > Patrick Wendell > >
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientPatrick Wendell 2012-10-19, 18:48
> On Oct. 17, 2012, 2:21 a.m., Mike Percy wrote: > > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, line 250 > > <https://reviews.apache.org/r/6377/diff/4/?file=176275#file176275line250> > > > > Nit: "} else {" should live on the same line as the closing and opening brace to match the style of the rest of the Flume code. fixed this in a few places. > On Oct. 17, 2012, 2:21 a.m., Mike Percy wrote: > > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, line 136 > > <https://reviews.apache.org/r/6377/diff/4/?file=176275#file176275line136> > > > > It's a best practice to always set the causal exception as the cause in the Exception constructor. Don't get rid of the old stacktrace, prefer something like: > > new FlumeException("Unable to commit read of file: " + filename, originalException) > > > > However, in this case, why not just let the method throw IOException? Fixed this so it just throws an exception. - Patrick ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12497 ----------------------------------------------------------- On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2012, 5:15 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this.
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientPatrick Wendell 2012-10-19, 18:49
> On Oct. 16, 2012, 9:49 p.m., Mike Percy wrote: > > One issue now is that this does not respect a max line length due to delegating to the BufferedReader. So there is the possibility of a too-large line being read but not committed to the channel, resulting in data loss because the reset() would throw an exception of java.io.IOException: Mark invalid. I think the only way around that is to write a custom BufferedReader. See also http://stackoverflow.com/questions/5960554/maximum-line-length-for-bufferedreader-readline-in-java and http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4107821 Hey Mike, Designing a bounded buffered reader is a nontrivial endevour (the ones linked to here are pretty complicated), but I also don't see why we need it. If the user has a line whose length is > MAX_LINE, then we are not going to be able to function correctly in any case, because the buffer isn't big enough to hold the entire line, and our entire commit semantics are on a per-line bases. Once we've seen a line that is > MAX_LINE, the source should shut down. Why don't we just check the line length after we read it and shut down the source with an error? Isn't that the whole reason we have MAX_LINE, so that it simplifies the sizing of the buffer? > On Oct. 16, 2012, 9:49 p.m., Mike Percy wrote: > > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, line 117 > > <https://reviews.apache.org/r/6377/diff/4/?file=176275#file176275line117> > > > > would be better for this to return null The reason this function exists is to return a human-readable version of the current file for print messages and for unit tests. That's why it returns a String and not a File object. I updated the doc string to make the usage more clear, but given that this is the intent, I'd rather have it return "<none>" - otherwise everyone that wants to use it to print out state about the LineReader needs to do a null check and chose their own representation of the null, and it won't be consistent. - Patrick ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12458 ----------------------------------------------------------- On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2012, 5:15 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientMike Percy 2012-10-19, 22:31
> On Oct. 19, 2012, 5:09 p.m., Patrick Wendell wrote: > > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, line 159 > > <https://reviews.apache.org/r/6377/diff/4/?file=176275#file176275line159> > > > > I get compiler type warnings when I do this - should I still go ahead? Sorry, you're right: try this instead: List<Integer> l = Collections.<Integer>emptyList(); - Mike ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12605 ----------------------------------------------------------- On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2012, 5:15 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. > > > Thanks, > > Patrick Wendell > >
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientMike Percy 2012-10-19, 22:43
> On Oct. 16, 2012, 9:49 p.m., Mike Percy wrote: > > One issue now is that this does not respect a max line length due to delegating to the BufferedReader. So there is the possibility of a too-large line being read but not committed to the channel, resulting in data loss because the reset() would throw an exception of java.io.IOException: Mark invalid. I think the only way around that is to write a custom BufferedReader. See also http://stackoverflow.com/questions/5960554/maximum-line-length-for-bufferedreader-readline-in-java and http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4107821 > > Patrick Wendell wrote: > Hey Mike, > > Designing a bounded buffered reader is a nontrivial endevour (the ones linked to here are pretty complicated), but I also don't see why we need it. > > If the user has a line whose length is > MAX_LINE, then we are not going to be able to function correctly in any case, because the buffer isn't big enough to hold the entire line, and our entire commit semantics are on a per-line bases. > > Once we've seen a line that is > MAX_LINE, the source should shut down. Why don't we just check the line length after we read it and shut down the source with an error? Isn't that the whole reason we have MAX_LINE, so that it simplifies the sizing of the buffer? Hey Patrick, I'm +1 on that for now. As long as we can't lose data, we can just fail hard in the initial version and make iterative improvements over time. > On Oct. 16, 2012, 9:49 p.m., Mike Percy wrote: > > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, line 117 > > <https://reviews.apache.org/r/6377/diff/4/?file=176275#file176275line117> > > > > would be better for this to return null > > Patrick Wendell wrote: > The reason this function exists is to return a human-readable version of the current file for print messages and for unit tests. That's why it returns a String and not a File object. > > I updated the doc string to make the usage more clear, but given that this is the intent, I'd rather have it return "<none>" - otherwise everyone that wants to use it to print out state about the LineReader needs to do a null check and chose their own representation of the null, and it won't be consistent. If you want to provide a string for the UI to display in the case of no file available, I would recommend providing some helper function which returns this "<none>" string when called for the UIs to use. However, if there is a filename called "<none>" in the directory, there would be no way for the client to be able to tell that it was really a file of that name if this string has a special meaning. - Mike ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12458 ----------------------------------------------------------- On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2012, 5:15 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process).
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientBrock Noland 2012-10-22, 14:26
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12655 ----------------------------------------------------------- Hey guys, how far we are from committing this? I'd like to see this in the 1.3 release. - Brock Noland On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2012, 5:15 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. > > > Thanks, > > Patrick Wendell > >
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientPatrick Wendell 2012-10-22, 19:31
> On Oct. 22, 2012, 2:26 p.m., Brock Noland wrote: > > Hey guys, how far we are from committing this? I'd like to see this in the 1.3 release. I should have something close to commit'able within a couple of hours. - Patrick ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12655 ----------------------------------------------------------- On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2012, 5:15 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. > > > Thanks, > > Patrick Wendell > >
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientPatrick Wendell 2012-10-22, 20:36
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/ ----------------------------------------------------------- (Updated Oct. 22, 2012, 8:36 p.m.) Review request for Flume. Changes ------- This patch addresses several small issues from Mike's last review, and two big ones: 1) All Throwables are caught in the spool source run thread, so the thread pool will continue to launch new attempts if there is an exception within one execution. 2) The source now logs an error and stops making progress if a line is found which exceeds the maximum line length. This means that users will not use data if they exceed the maximum line length, but it also means flume will be unavailable until they fix the issue (it will keep repeating the error). 3) I changed the code which gets the filename associated with a particular readLines() call because it was not correct in certain cases where the readLines() call itself forces a file roll. I added a unit test for this. Description ------- This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. This addresses bug FlUME-1425. https://issues.apache.org/jira/browse/FlUME-1425 Diffs (updated) ----- flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 Diff: https://reviews.apache.org/r/6377/diff/ Testing ------- Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. Thanks, Patrick Wendell
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientMike Percy 2012-10-30, 06:22
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12907 ----------------------------------------------------------- Sorry for the delay in this review, I was out of town last week. There is still an issue with the BufferedReader / mark / reset semantics, please see below for details. flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment27742> Please add an note to the javadocs saying that the SpoolingFileLineReader class is ONLY for internal use by Flume components, not for developers extending Flume. flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment27745> If the line length exceeds the length of the BufferedReader, this reset() call will throw a java.io.IOException: Mark invalid Also, committed = true when you hit this case, since committed = false does not get set until right before the return statement. Therefore, assuming the client catches the exception and recovers from it, the next call to readLines(n) will simply skip over those lines and the data will be lost. So if there is an error in which the line is too long, since we have decided that we going to fail then we should fail explicitly and permanently: 1. Throw a FlumeException indicating a too-long line was seen (do not try to reset the reader) and also: 2. Permanently disable the SpoolingFileLineReader object so that the next time someone tries to call any operation on it, such as readLines(), commit(), or close(), throw an IllegalStateException indicating that it is no longer in a usable state. In a future revision of this implementation, we can do something a little more graceful, but we have to ensure that we don't lose data in any case. flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment27740> This should be something like: int bufferSize = bufferMaxLines * bufferMaxLineLength; BufferedReader reader = new BufferedReader(new FileReader(nextFile), bufferSize); reader.mark(bufferSize); ...since the default buffer size in the BufferedReader constructor is 8124 in JDK 6 and that may not be enough to hold your bufferSize (or it may be too much). Also, the edge-case behavior of BufferedReader (w.r.t. mark and reset) is different in situations where the constructor arg is different than the arg passed to mark(), so rather than deal with that inconsistency it makes sense to do this. flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment27746> This test is failing for me. Not sure why, haven't dug into it much yet. flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment27747> It's failing on this assert flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment27741> This line is not long enough to exceed the buffer size in the mark() call, so we are not testing the case of an extremely long line. We need to test the situation where reset() throws an exception, since we will lose data in that case with the current implementation. While a barely-too-long line is a valid case to test, we also need a case that tests the extremely long line scenario, i.e.: // make a prefix string as long as the internal buffer size StringBuilder sb = new StringBuilder(); for (int i = 0; i < maxLines * maxLineLength; i++) { sb.append('x'); } String lotsofXs = sb.toString(); Files.write("file1line1\nfile1line2\nfile1line3\nfile1line4\n" + "file1line5\nfile1line6\nfile1line7\nfile1line8\n" + lotsOfXs + " reallyreallyreallyreallyLongfile1line9\n" + // <-- line exceeds BufferedReader internal buf "file1line10\nfile1line11\nfile1line12\nfile1line13\n", f1, Charsets.UTF_8); - Mike Percy On Oct. 22, 2012, 8:36 p.m., Patrick Wendell wrote:
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientPatrick Wendell 2012-11-01, 04:47
> On Oct. 30, 2012, 6:22 a.m., Mike Percy wrote: > > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, line 300 > > <https://reviews.apache.org/r/6377/diff/5/?file=178797#file178797line300> > > > > This should be something like: > > > > int bufferSize = bufferMaxLines * bufferMaxLineLength; > > BufferedReader reader = new BufferedReader(new FileReader(nextFile), bufferSize); > > reader.mark(bufferSize); > > > > ...since the default buffer size in the BufferedReader constructor is 8124 in JDK 6 and that may not be enough to hold your bufferSize (or it may be too much). Also, the edge-case behavior of BufferedReader (w.r.t. mark and reset) is different in situations where the constructor arg is different than the arg passed to mark(), so rather than deal with that inconsistency it makes sense to do this. Ya good catch. > On Oct. 30, 2012, 6:22 a.m., Mike Percy wrote: > > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, line 188 > > <https://reviews.apache.org/r/6377/diff/5/?file=178797#file178797line188> > > > > If the line length exceeds the length of the BufferedReader, this reset() call will throw a java.io.IOException: Mark invalid > > > > Also, committed = true when you hit this case, since committed = false does not get set until right before the return statement. > > > > Therefore, assuming the client catches the exception and recovers from it, the next call to readLines(n) will simply skip over those lines and the data will be lost. > > > > So if there is an error in which the line is too long, since we have decided that we going to fail then we should fail explicitly and permanently: > > 1. Throw a FlumeException indicating a too-long line was seen (do not try to reset the reader) and also: > > 2. Permanently disable the SpoolingFileLineReader object so that the next time someone tries to call any operation on it, such as readLines(), commit(), or close(), throw an IllegalStateException indicating that it is no longer in a usable state. > > > > In a future revision of this implementation, we can do something a little more graceful, but we have to ensure that we don't lose data in any case. Yep this should stop the world when it fails... my mistake. Added some tests to make sure this works. > On Oct. 30, 2012, 6:22 a.m., Mike Percy wrote: > > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java, line 363 > > <https://reviews.apache.org/r/6377/diff/5/?file=178801#file178801line363> > > > > This test is failing for me. Not sure why, haven't dug into it much yet. I can't yet reproduce this failure. - Patrick ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12907 ----------------------------------------------------------- On Oct. 22, 2012, 8:36 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 22, 2012, 8:36 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process).
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientMike Percy 2012-11-06, 01:13
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review13120 ----------------------------------------------------------- Hi Patrick, overall this looks good to me. Only thing is the test failure. I am tempted to just commit this, but I'm trying to understand what's going on with the test. It looks to me that the test is trying to assert that the files not be processed if they are empty. I added some debug messages to the assert and this is what it is printing: testBehaviorWithEmptyFile(org.apache.flume.client.avro.TestSpoolingFileLineReader) Time elapsed: 1.012 sec <<< FAILURE! java.lang.AssertionError: Does not contain file: /var/folders/fg/ym7gqsvs5h30gg13yjpt1hm00000gn/T/1352164166517-0/file1, only contains: [/var/folders/fg/ym7gqsvs5h30gg13yjpt1hm00000gn/T/1352164166517-0/file1.COMPLETE, /var/folders/fg/ym7gqsvs5h30gg13yjpt1hm00000gn/T/1352164166517-0/file2.COMPLETE] at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.flume.client.avro.TestSpoolingFileLineReader.testBehaviorWithEmptyFile(TestSpoolingFileLineReader.java:399) So the test is asserting that it should not have rolled test1 to test1.COMPLETE, but it has (At the end of this test, it has actually run both). I don't see any code in the patch indicating that files that have no data should not be read. So I don't understand how this test could be passing for you. Please try re-running the test using this command from the Flume top-level: mvn clean install -Dtest=TestSpoolingFileLineReader -DfailIfNoTests=false ... I will be very surprised if it passes. I am still reviewing this patch and am currently leaning toward adding an @Ignore to this test for commit, since I'd like to get this feature into 1.3.0. In the mean time, any comments on the desired behavior are appreciated. Regards, Mike - Mike Percy On Oct. 22, 2012, 8:36 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 22, 2012, 8:36 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientMike Percy 2012-11-06, 02:12
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review13121 ----------------------------------------------------------- Ship it! Hey Patrick, really good work overall. I'm going to commit this as-is. We can follow-up more after the initial commit. - Mike Percy On Oct. 22, 2012, 8:36 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 22, 2012, 8:36 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. > > > Thanks, > > Patrick Wendell > >
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientPatrick Wendell 2012-11-06, 06:34
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/ ----------------------------------------------------------- (Updated Nov. 6, 2012, 6:34 a.m.) Review request for Flume. Changes ------- This is a small patch that should fix the unit test issues. I need someone to run on Mac to confirm that this works. Description ------- This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. This addresses bug FlUME-1425. https://issues.apache.org/jira/browse/FlUME-1425 Diffs (updated) ----- flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 Diff: https://reviews.apache.org/r/6377/diff/ Testing ------- Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. Thanks, Patrick Wendell
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientAlexander Alten-Lorenz 2012-11-06, 08:27
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review13130 ----------------------------------------------------------- Ship it! Patch applied, OSX ML ------------------------------------------------------ T E S T S ------------------------------------------------------- ------------------------------------------------------- T E S T S ------------------------------------------------------- Running org.apache.flume.api.TestFailoverRpcClient Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.597 sec Running org.apache.flume.api.TestLoadBalancingRpcClient Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.793 sec Running org.apache.flume.api.TestNettyAvroRpcClient Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.845 sec Running org.apache.flume.api.TestRpcClientFactory Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.279 sec Running org.apache.flume.event.TestEventBuilder Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.045 sec Results : Tests run: 35, Failures: 0, Errors: 0, Skipped: 0 - Alexander Alten-Lorenz On Nov. 6, 2012, 6:34 a.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Nov. 6, 2012, 6:34 a.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this.
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and ClientAlexander Alten-Lorenz 2012-11-06, 09:15
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review13136 ----------------------------------------------------------- flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment28292> fails: testBehaviorWithEmptyFile(org.apache.flume.client.avro.TestSpoolingFileLineReader) Time elapsed: 0.005 sec <<< FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.flume.client.avro.TestSpoolingFileLineReader.testBehaviorWithEmptyFile(TestSpoolingFileLineReader.java:396) flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java <https://reviews.apache.org/r/6377/#comment28293> fails: 2012-11-06 10:04:43,339 (main) [ERROR - org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:196)] Found line longer than 15 characters, cannot make progress. 2012-11-06 10:04:43,339 (main) [ERROR - org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:200)] Invalid line starts with: reallyreallyreallyreallyreally 2012-11-06 10:04:43,342 (main) [ERROR - org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:196)] Found line longer than 15 characters, cannot make progress. 2012-11-06 10:04:43,342 (main) [ERROR - org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:200)] Invalid line starts with: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 2012-11-06 10:04:43,344 (main) [ERROR - org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:196)] Found line longer than 15 characters, cannot make progress. 2012-11-06 10:04:43,344 (main) [ERROR - org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:200)] Invalid line starts with: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx - Alexander Alten-Lorenz On Nov. 6, 2012, 6:34 a.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Nov. 6, 2012, 6:34 a.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION |