-Re: How to read multiples files getting continuously updated
Steve Morin 2013-10-10, 06:11
If the files are continually written to I don't think there is a good
option. Can new files be written to every time interval?
On Wed, Oct 9, 2013 at 11:09 PM, Abhijeet Shipure <[EMAIL PROTECTED]>wrote:
> Hi Steve,
> Thanks for quick reply, as you pointed out Exec Source does not provide
> reliability, which is required in my case, and hence it is not suitable.
> So which other inbuilt source could be used to read from many files ? Just
> one other requirement is file name s are also dynamically generated using
> time stamp after every 5 mins.
> On Thu, Oct 10, 2013 at 11:22 AM, Steve Morin <[EMAIL PROTECTED]>wrote:
>> If your read the Flume manual it doesn't support a tail source
>> The problem with ExecSource and other asynchronous sources is that the
>> source can not guarantee that if there is a failure to put the event into
>> the Channel the client knows about it. In such cases, the data will be
>> lost. As a for instance, one of the most commonly requested features is the
>> tail -F [file]-like use case where an application writes to a log file
>> on disk and Flume tails the file, sending each line as an event. While this
>> is possible, there’s an obvious problem; what happens if the channel fills
>> up and Flume can’t send an event? Flume has no way of indicating to the
>> application writing the log file that it needs to retain the log or that
>> the event hasn’t been sent, for some reason. If this doesn’t make sense,
>> you need only know this: Your application can never guarantee data has been
>> received when using a unidirectional asynchronous interface such as
>> ExecSource! As an extension of this warning - and to be completely clear -
>> there is absolutely zero guarantee of event delivery when using this
>> source. For stronger reliability guarantees, consider the Spooling
>> Directory Source or direct integration with Flume via the SDK.
>> On Wed, Oct 9, 2013 at 10:33 PM, Abhijeet Shipure <[EMAIL PROTECTED]
>> > wrote:
>>> I am looking for Flume NG source that can be used for reading many files
>>> which are getting continuously updated.
>>> I trued Spool Dir source but it does not work if file to be read gets
>>> Here is the scenario:
>>> 100 files are getting generated at one time and these files
>>> are continuously updated for fixed interval say 5 mins, after 5 mins new
>>> 100 files get generated and being written again for 5 mins.
>>> Which flume source is most suitable and how it should be used
>>> effectively without any data loss.
>>> Any help is greatly appreciated.
>>> Abhijeet Shipure