Got it - I'm building this for deployment settings where it is not
possible to integrated directly with flume directly (e.g thru an
appender). That is the preferable option when you can do it, so makes
sense that you are going that way.
On Wed, Sep 5, 2012 at 7:26 AM, Steve Johnson <[EMAIL PROTECTED]> wrote:
> Patrick, thanks, this could be cool for testing, but I'm planning on using
> an lf4j avro logger to send straight to an avro source anyway. That's the
> hope at least.
> The perl script was just something i used to bench-test the framework
> itself. But ideally, we want to avoid logging to files at all and use
> Flume. However, I will be bench-testing the avro stuff and seeing how it
> performs for us, if it's not what I'm looking for, I may be interested in
> other options.
> On Sun, Sep 2, 2012 at 4:45 PM, Patrick Wendell <[EMAIL PROTECTED]> wrote:
>> Hey Chris - what Steve said is right on:
>> "Unless you can always guarantee that you will always be able to
>> continue where you left off and never re-send data then it's probably
>> best to go right to the logging source and have that piece send
>> directly to flume (ie, avro, lf4j plugins etc.)."
>> If you are using an asynchronous source, like tailing, there is always
>> a possibility of data loss. What if the disk that the log is stored on
>> fails before flume gets to it? This failure window is inherent in
>> trying to collect logs like this - and that is what the warning is
>> speaking to.
>> Steve - I am working on a tool to read through rolled log files on
>> disk, send them to a Flume agent, and then rename or delete the
>> files... would be interested to hear whether you think this could
>> displace your current perl setup in terms of functionality.
>> - Patrick
>> On Thu, Aug 30, 2012 at 8:06 AM, Steve Johnson <[EMAIL PROTECTED]> wrote:
>> > Chris, I'm testing something similar from the sounds of it. We were
>> > originally going to go with the idea of using some sort of log tailer to
>> > pass events (log recs) into the flume agent. Right now, I'm testing
>> > using a
>> > simple perl script that reads a rotated log file, and sends them over
>> > the
>> > network to a flume agent using the NetCat source. This is not ideal,
>> > but is
>> > good enough for some initial Flume testing, which right now, I'm just
>> > trying
>> > to stress test the system.
>> > When you think about it, the nature of tailing logs is that you really
>> > can't
>> > guarantee delivery anyway. For instance, what happens if you need to
>> > take
>> > your server down, or the tailer fails and you need to restart it, where
>> > were you at in tailing the log? In my case, it is as bad or worse for
>> > us to
>> > duplicate a logrec as it is to miss them. So tailing itself is a tricky
>> > thing. Unless you can always guarantee that you will always be able to
>> > continue where you left off and never re-send data then it's probably
>> > best
>> > to go right to the logging source and have that piece send directly to
>> > flume
>> > (ie, avro, lf4j plugins etc.). However, the downfall there is that if
>> > the
>> > flume agent goes down, your app generating the logs should as well to
>> > ensure
>> > you don't process requests that you can't keep a record of, or at least
>> > write it smart enough to fall-back to a file when that happens so that
>> > you
>> > can recover them in a batch process later.
>> > However, if your using this for something like sysloging, error logs,
>> > monitoring, it's probbaly not that critical if you duplicated or missed
>> > some
>> > logrecs for a short time after a recovery. I guess it really depends on
>> > the
>> > application. I'll be interested to hear your solution though for this,
>> > as
>> > I'm still in the process myself.
>> > Thanks
>> > On Thu, Aug 30, 2012 at 9:45 AM, Chris Neal <[EMAIL PROTECTED]> wrote:
>> >> Hi Patrick,
>> >> My issue with ExecSource is the giant warning in the user guide: