Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> post-processing

Copy link to this message
Re: post-processing
I wouldn't modify the files while flume is also modifying them. It
might work but also might be a complete mess. If you need to modify
the events before being written interceptors are the correct solution.
After the file is done from a flume perspective, modify all you wish!

On Fri, Dec 21, 2012 at 2:26 PM, Cochran, David <[EMAIL PROTECTED]> wrote:
> just had a thought... before I turn this script up and make a mess of things
> I figured I'd ask the group...
> I'm running FLUME 1.3 running using FILE_ROLL at the sink.... the 'live in
> use' files are being periodically scanned for key events while still "live'
> and being appending to by Flume... no problems there as they are just being
> read....
> now the interesting part, I also need to do a little processing of the
> stored logs (using sed) to insert a couple pieces of data into each line (if
> it doesn't already exist) before my log scanner process does it's thing.
> I'm not sure what the odds are of this NOT totally hosing the flume
> process/data will be...maybe recognizes the file is in use and waits? The
> files are processed by sed pretty quickly ( ~15 secs) as they are rotated
> daily.
> Has anyone else tried this yet or have any insight as to how Flume might
> react before I attempt to make bit soup?
> Thanks,
> -Dave

Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/