Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # dev - Review Request: FLUME-1425: Create Standalone Spooling Client


Copy link to this message
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and Client
Patrick Wendell 2012-10-19, 18:49


> On Oct. 16, 2012, 9:49 p.m., Mike Percy wrote:
> > One issue now is that this does not respect a max line length due to delegating to the BufferedReader. So there is the possibility of a too-large line being read but not committed to the channel, resulting in data loss because the reset() would throw an exception of java.io.IOException: Mark invalid. I think the only way around that is to write a custom BufferedReader. See also http://stackoverflow.com/questions/5960554/maximum-line-length-for-bufferedreader-readline-in-java and http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4107821

Hey Mike,

Designing a bounded buffered reader is a nontrivial endevour (the ones linked to here are pretty complicated), but I also don't see why we need it.

If the user has a line whose length is > MAX_LINE, then we are not going to be able to function correctly in any case, because the buffer isn't big enough to hold the entire line, and our entire commit semantics are on a per-line bases.

Once we've seen a line that is > MAX_LINE, the source should shut down. Why don't we just check the line length after we read it and shut down the source with an error? Isn't that the whole reason we have MAX_LINE, so that it simplifies the sizing of the buffer?
> On Oct. 16, 2012, 9:49 p.m., Mike Percy wrote:
> > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, line 117
> > <https://reviews.apache.org/r/6377/diff/4/?file=176275#file176275line117>
> >
> >     would be better for this to return null

The reason this function exists is to return a human-readable version of the current file for print messages and for unit tests. That's why it returns a String and not a File object.

I updated the doc string to make the usage more clear, but given that this is the intent, I'd rather have it return "<none>" - otherwise everyone that wants to use it to print out state about the LineReader needs to do a null check and chose their own representation of the null, and it won't be consistent.
- Patrick
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6377/#review12458
-----------------------------------------------------------
On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6377/
> -----------------------------------------------------------
>
> (Updated Oct. 12, 2012, 5:15 p.m.)
>
>
> Review request for Flume.
>
>
> Description
> -------
>
> This patch adds a spooling directory based source. The  idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code.
>
> This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process).
>
> Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner.
>
>
> This addresses bug FlUME-1425.
>     https://issues.apache.org/jira/browse/FlUME-1425
>
>
> Diffs
> -----
>
>   flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7
>   flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c
>   flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae
>   flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION