Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> post-processing


Copy link to this message
-
Re: post-processing
I wouldn't modify the files while flume is also modifying them. It
might work but also might be a complete mess. If you need to modify
the events before being written interceptors are the correct solution.
After the file is done from a flume perspective, modify all you wish!

On Fri, Dec 21, 2012 at 2:26 PM, Cochran, David <[EMAIL PROTECTED]> wrote:
> just had a thought... before I turn this script up and make a mess of things
> I figured I'd ask the group...
>
> I'm running FLUME 1.3 running using FILE_ROLL at the sink.... the 'live in
> use' files are being periodically scanned for key events while still "live'
> and being appending to by Flume... no problems there as they are just being
> read....
>
> now the interesting part, I also need to do a little processing of the
> stored logs (using sed) to insert a couple pieces of data into each line (if
> it doesn't already exist) before my log scanner process does it's thing.
>
> I'm not sure what the odds are of this NOT totally hosing the flume
> process/data will be...maybe recognizes the file is in use and waits? The
> files are processed by sed pretty quickly ( ~15 secs) as they are rotated
> daily.
>
> Has anyone else tried this yet or have any insight as to how Flume might
> react before I attempt to make bit soup?
>
> Thanks,
> -Dave

--
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB