Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # dev >> Review Request: FLUME-1425: Create Standalone Spooling Client


+
Patrick Wendell 2012-08-04, 06:47
+
Patrick Wendell 2012-08-04, 06:59
+
Jarek Cecho 2012-08-07, 07:09
+
Patrick Wendell 2012-08-07, 17:46
+
Hari Shreedharan 2012-08-07, 18:40
+
Patrick Wendell 2012-08-11, 05:32
+
Patrick Wendell 2012-08-11, 05:34
+
Patrick Wendell 2012-08-14, 21:54
+
Patrick Wendell 2012-08-14, 22:02
+
Mike Percy 2012-08-23, 10:11
+
Mike Percy 2012-10-07, 04:30
+
Patrick Wendell 2012-10-11, 18:10
+
Mike Percy 2012-10-11, 19:30
+
Patrick Wendell 2012-10-11, 18:02
+
Mike Percy 2012-10-11, 19:31
+
Patrick Wendell 2012-10-11, 19:47
+
Patrick Wendell 2012-10-12, 17:15
+
Cameron Gandevia 2012-10-15, 19:18
+
Patrick Wendell 2012-10-15, 21:11
+
Mike Percy 2012-10-15, 22:02
+
Cameron Gandevia 2012-10-16, 01:19
+
Mike Percy 2012-10-17, 00:26
+
Mike Percy 2012-10-16, 21:49
+
Patrick Wendell 2012-10-19, 18:49
+
Mike Percy 2012-10-19, 22:43
+
Mike Percy 2012-10-17, 02:21
+
Patrick Wendell 2012-10-19, 18:48
+
Patrick Wendell 2012-10-19, 17:09
+
Mike Percy 2012-10-19, 22:31
+
Brock Noland 2012-10-22, 14:26
+
Patrick Wendell 2012-10-22, 19:31
+
Patrick Wendell 2012-10-22, 20:36
+
Mike Percy 2012-11-06, 01:13
+
Mike Percy 2012-11-06, 02:12
+
Patrick Wendell 2012-11-06, 06:34
+
Alexander Alten-Lorenz 2012-11-06, 08:27
+
Alexander Alten-Lorenz 2012-11-06, 09:15
+
Mike Percy 2012-10-30, 06:22
Copy link to this message
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and Client


> On Oct. 30, 2012, 6:22 a.m., Mike Percy wrote:
> > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, line 300
> > <https://reviews.apache.org/r/6377/diff/5/?file=178797#file178797line300>
> >
> >     This should be something like:
> >    
> >     int bufferSize = bufferMaxLines * bufferMaxLineLength;
> >     BufferedReader reader = new BufferedReader(new FileReader(nextFile), bufferSize);
> >     reader.mark(bufferSize);
> >    
> >     ...since the default buffer size in the BufferedReader constructor is 8124 in JDK 6 and that may not be enough to hold your bufferSize (or it may be too much). Also, the edge-case behavior of BufferedReader (w.r.t. mark and reset) is different in situations where the constructor arg is different than the arg passed to mark(), so rather than deal with that inconsistency it makes sense to do this.

Ya good catch.
> On Oct. 30, 2012, 6:22 a.m., Mike Percy wrote:
> > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, line 188
> > <https://reviews.apache.org/r/6377/diff/5/?file=178797#file178797line188>
> >
> >     If the line length exceeds the length of the BufferedReader, this reset() call will throw a java.io.IOException: Mark invalid
> >    
> >     Also, committed = true when you hit this case, since committed = false does not get set until right before the return statement.
> >    
> >     Therefore, assuming the client catches the exception and recovers from it, the next call to readLines(n) will simply skip over those lines and the data will be lost.
> >    
> >     So if there is an error in which the line is too long, since we have decided that we going to fail then we should fail explicitly and permanently:
> >     1. Throw a FlumeException indicating a too-long line was seen (do not try to reset the reader) and also:
> >     2. Permanently disable the SpoolingFileLineReader object so that the next time someone tries to call any operation on it, such as readLines(), commit(), or close(), throw an IllegalStateException indicating that it is no longer in a usable state.
> >    
> >     In a future revision of this implementation, we can do something a little more graceful, but we have to ensure that we don't lose data in any case.

Yep this should stop the world when it fails... my mistake. Added some tests to make sure this works.
> On Oct. 30, 2012, 6:22 a.m., Mike Percy wrote:
> > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java, line 363
> > <https://reviews.apache.org/r/6377/diff/5/?file=178801#file178801line363>
> >
> >     This test is failing for me. Not sure why, haven't dug into it much yet.

I can't yet reproduce this failure.
- Patrick
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6377/#review12907
-----------------------------------------------------------
On Oct. 22, 2012, 8:36 p.m., Patrick Wendell wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6377/
> -----------------------------------------------------------
>
> (Updated Oct. 22, 2012, 8:36 p.m.)
>
>
> Review request for Flume.
>
>
> Description
> -------
>
> This patch adds a spooling directory based source. The  idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code.
>
> This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process).