Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Review Request: FLUME-1425: Create Standalone Spooling Client

Copy link to this message
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and Client

This is an automatically generated e-mail. To reply, visit:
Sorry for the delay in this review, I was out of town last week. There is still an issue with the BufferedReader / mark / reset semantics, please see below for details.

    Please add an note to the javadocs saying that the SpoolingFileLineReader class is ONLY for internal use by Flume components, not for developers extending Flume.


    If the line length exceeds the length of the BufferedReader, this reset() call will throw a java.io.IOException: Mark invalid
    Also, committed = true when you hit this case, since committed = false does not get set until right before the return statement.
    Therefore, assuming the client catches the exception and recovers from it, the next call to readLines(n) will simply skip over those lines and the data will be lost.
    So if there is an error in which the line is too long, since we have decided that we going to fail then we should fail explicitly and permanently:
    1. Throw a FlumeException indicating a too-long line was seen (do not try to reset the reader) and also:
    2. Permanently disable the SpoolingFileLineReader object so that the next time someone tries to call any operation on it, such as readLines(), commit(), or close(), throw an IllegalStateException indicating that it is no longer in a usable state.
    In a future revision of this implementation, we can do something a little more graceful, but we have to ensure that we don't lose data in any case.


    This should be something like:
    int bufferSize = bufferMaxLines * bufferMaxLineLength;
    BufferedReader reader = new BufferedReader(new FileReader(nextFile), bufferSize);
    ...since the default buffer size in the BufferedReader constructor is 8124 in JDK 6 and that may not be enough to hold your bufferSize (or it may be too much). Also, the edge-case behavior of BufferedReader (w.r.t. mark and reset) is different in situations where the constructor arg is different than the arg passed to mark(), so rather than deal with that inconsistency it makes sense to do this.


    This test is failing for me. Not sure why, haven't dug into it much yet.


    It's failing on this assert


    This line is not long enough to exceed the buffer size in the mark() call, so we are not testing the case of an extremely long line. We need to test the situation where reset() throws an exception, since we will lose data in that case with the current implementation.
    While a barely-too-long line is a valid case to test, we also need a case that tests the extremely long line scenario, i.e.:
    // make a prefix string as long as the internal buffer size
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < maxLines * maxLineLength; i++) {
    String lotsofXs = sb.toString();
    Files.write("file1line1\nfile1line2\nfile1line3\nfile1line4\n" +
      "file1line5\nfile1line6\nfile1line7\nfile1line8\n" +
      lotsOfXs + " reallyreallyreallyreallyLongfile1line9\n" +  // <-- line exceeds BufferedReader internal buf
      f1, Charsets.UTF_8);
- Mike Percy
On Oct. 22, 2012, 8:36 p.m., Patrick Wendell wrote: