Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Would someone please comment on Tail Source in NG?


Copy link to this message
-
Re: Would someone please comment on Tail Source in NG?
Patrick Wendell 2012-08-30, 05:07
Hey Chris,

I'm not clear what functionality you would want from the TailSource
could offer that's not already offered by (a) using ExecSource (b)
putting flume inside your application or (c) using the asyncronous log
spooler that I am working on.

It's impossible to correctly "watch" a file from within the JVM across
application restarts. For instance, if the file is renamed, swapped,
or mdified while the JVM is down (as is common with rolling logs),
there is no way to know whether the old and new file are the same.

Within the bounds of what *is* possible, I'd say we have the use cases
pretty much covered, but I'm open to debate if I've missed something.

- Patrick

On Wed, Aug 29, 2012 at 6:51 PM, Juhani Connolly
<[EMAIL PROTECTED]> wrote:
> Hi Chris,
>
> A few months back I actually ported the original flumes tail source, but it
> was decided(and I agree with the reasoning) not to include it for a number
> of reasons, which can be seen on the original ticket at
> https://issues.apache.org/jira/browse/FLUME-931 . One of the big ones is the
> fact that java cannot access inode information.
>
> What we do is have a python program that tracks the files in a directory and
> then sends the data using the scribe format to the ScribeSource(we were
> using scribe until switching to flume, so are just using our ingest system
> from then). This allows for the freedom to customize the ingest to our own
> expectations, and we write checkpoints of how far we have tailed. You could
> write this in whatever language you're comfortable with and pass the data
> via avro or thrift.
>
>
> On 08/30/2012 01:18 AM, Chris Neal wrote:
>
> Hey guys,
>
> I'm sure this is not a new question, but I haven't found an answer in my
> searches.  I'm curious why there is as of yet no Tail Source with NG?  It
> seems one of the most common use cases for Flume is to tail a log file and
> dump it "somewhere".  Given that, it sure would seem that a Tail Source
> would be one of the first sources that gets written with a new version.
>
> I know about all the other ways to implement something *like* a Tail Source:
> Exec Source, AVRO, Log4Jappender...  and unfortunately they all have
> limitations with regards to either functionality or
> reliability/recoverability.
>
> What am I missing here?
>
> Is there any work being done on a Tail Source for NG?
>
> I promise I'm not complaining, just trying to understand the logic. :)
>
> Much appreciated.
> Chris
>
>