Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # dev - Review Request: FLUME-1425: Create Standalone Spooling Client


+
Patrick Wendell 2012-08-04, 06:47
+
Patrick Wendell 2012-08-04, 06:59
+
Jarek Cecho 2012-08-07, 07:09
+
Patrick Wendell 2012-08-07, 17:46
+
Hari Shreedharan 2012-08-07, 18:40
+
Patrick Wendell 2012-08-11, 05:32
+
Patrick Wendell 2012-08-11, 05:34
+
Patrick Wendell 2012-08-14, 21:54
+
Patrick Wendell 2012-08-14, 22:02
+
Mike Percy 2012-08-23, 10:11
+
Mike Percy 2012-10-07, 04:30
Copy link to this message
-
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and Client
Patrick Wendell 2012-10-11, 18:10


> On Oct. 7, 2012, 4:30 a.m., Mike Percy wrote:
> > Sorry for taking so long for me to get back to this. I read through the code again, and I figured out what was throwing me off the first time. The way this is structured, there is a polling check-and-sleep mechanism built into the SpoolingFileLineReader, causing it to block, as well as a start/stop mechanism. This makes the SpoolingFileLineReader pretty complicated and, in my opinion, a bit tricky to understand. I think it would be great to refactor this code so that the looping mechanism was done via ScheduledExecutorService.scheduleWithFixedDelay at the top level of the Source and/or Client (actually, an Executor is already there in the Source), allowing the SpoolingFileLineReader to focus more on reading files and less on maintaining state. This should also remove the need for the ReaderStoppedException.
> >
> > Another thing to consider is error conditions. What happens if processEvents() throws a ChannelException? This can happen if the channel is full. In that case, we should back off a bit, and then retry pushing the same events on the next run. Along the same vein, we should not roll a file until we are certain all events have successfully been committed to a channel.
> >
> > To clarify what I'm suggesting, this is an example of how it could be refactored:
> >
> > SpoolDirectorySource.start: scheduledExecutorService.scheduleWithFixedDelay(new SpoolRunnable(new SpoolingFileLineReader(...)), 0L, POLL_SLEEP_MS, TimeUnit.MILLISECONDS);
> > SpoolRunnable.run: while (true) { lines = reader.readLines(batchSize); if (lines == null) return; for (line in lines) { events.add(createEvent(line)); } try { channelProcessor.processEvents(events); reader.commit(); } catch (ChannelException ex) { /* ... */ } }
> > SpoolingFileLineReader.readLines: If no commit since last readLines() call, then return the previous result from a saved buffer. Otherwise, read, save to a buffer, and then return lines from an available file if possible. If no files are available, return null.
> > SpoolingFileLineReader.commit: Clear the saved buffer. If the current file is at EOF, then retire the file.
> >
> > Thanks for all the hard work. Let's take this to the finish line.

In general - I like this pattern much better than what is there now.

One question though - this assumes that a given SpoolingFileLineReader will only be accessed from a single thread. If several threads are calling readLines() and commit() it will mess with the semantics of commit(). In the current model you can have multiple threads reading lines interchangeably (I think - thought I haven't fully fleshed this out in my mind).

I think not being thread-safe is fine though, given the way that this hooks into a source - where it will only be accessed by one thread.
- Patrick
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6377/#review12210
-----------------------------------------------------------
On Aug. 14, 2012, 10:02 p.m., Patrick Wendell wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6377/
> -----------------------------------------------------------
>
> (Updated Aug. 14, 2012, 10:02 p.m.)
>
>
> Review request for Flume.
>
>
> Description
> -------
>
> This patch adds a spooling directory based source. The  idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code.
>
> This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process).
+
Mike Percy 2012-10-11, 19:30
+
Patrick Wendell 2012-10-11, 18:02
+
Mike Percy 2012-10-11, 19:31
+
Patrick Wendell 2012-10-11, 19:47
+
Patrick Wendell 2012-10-12, 17:15
+
Cameron Gandevia 2012-10-15, 19:18
+
Patrick Wendell 2012-10-15, 21:11
+
Mike Percy 2012-10-15, 22:02
+
Cameron Gandevia 2012-10-16, 01:19
+
Mike Percy 2012-10-17, 00:26
+
Mike Percy 2012-10-16, 21:49
+
Patrick Wendell 2012-10-19, 18:49
+
Mike Percy 2012-10-19, 22:43
+
Mike Percy 2012-10-17, 02:21
+
Patrick Wendell 2012-10-19, 18:48
+
Patrick Wendell 2012-10-19, 17:09
+
Mike Percy 2012-10-19, 22:31
+
Brock Noland 2012-10-22, 14:26
+
Patrick Wendell 2012-10-22, 19:31
+
Patrick Wendell 2012-10-22, 20:36
+
Mike Percy 2012-11-06, 01:13
+
Mike Percy 2012-11-06, 02:12
+
Patrick Wendell 2012-11-06, 06:34
+
Alexander Alten-Lorenz 2012-11-06, 08:27
+
Alexander Alten-Lorenz 2012-11-06, 09:15
+
Mike Percy 2012-10-30, 06:22
+
Patrick Wendell 2012-11-01, 04:47