Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> post-processing

Copy link to this message
just had a thought... before I turn this script up and make a mess of
things I figured I'd ask the group...

I'm running FLUME 1.3 running using FILE_ROLL at the sink.... the 'live in
use' files are being periodically scanned for key events while still "live'
and being appending to by Flume... no problems there as they are just being

now the interesting part, I also need to do a little processing of the
stored logs (using sed) to insert a couple pieces of data into each line
(if it doesn't already exist) before my log scanner process does it's thing.

I'm not sure what the odds are of this NOT totally hosing the flume
process/data will be...maybe recognizes the file is in use and waits? The
files are processed by sed pretty quickly ( ~15 secs) as they are rotated

Has anyone else tried this yet or have any insight as to how Flume might
react before I attempt to make bit soup?