Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> How to read multiples files getting continuously updated


Copy link to this message
-
Re: How to read multiples files getting continuously updated
If the files are continually written to I don't think there is a good
option.  Can new files be written to every time interval?
On Wed, Oct 9, 2013 at 11:09 PM, Abhijeet Shipure <[EMAIL PROTECTED]>wrote:

> Hi Steve,
>
> Thanks for quick reply, as you pointed out Exec Source does not provide
> reliability, which is required in my case, and hence it is not suitable.
>
> So which other inbuilt source could be used to read from many files ? Just
> one other requirement is file name s are also dynamically generated using
> time stamp after every 5 mins.
>
>
> Regards
> Abhijeet
>
>
> On Thu, Oct 10, 2013 at 11:22 AM, Steve Morin <[EMAIL PROTECTED]>wrote:
>
>> If your read the Flume manual it doesn't support a tail source
>>
>> http://flume.apache.org/FlumeUserGuide.html#exec-source
>>
>> Warning
>>
>>
>> The problem with ExecSource and other asynchronous sources is that the
>> source can not guarantee that if there is a failure to put the event into
>> the Channel the client knows about it. In such cases, the data will be
>> lost. As a for instance, one of the most commonly requested features is the
>> tail -F [file]-like use case where an application writes to a log file
>> on disk and Flume tails the file, sending each line as an event. While this
>> is possible, there’s an obvious problem; what happens if the channel fills
>> up and Flume can’t send an event? Flume has no way of indicating to the
>> application writing the log file that it needs to retain the log or that
>> the event hasn’t been sent, for some reason. If this doesn’t make sense,
>> you need only know this: Your application can never guarantee data has been
>> received when using a unidirectional asynchronous interface such as
>> ExecSource! As an extension of this warning - and to be completely clear -
>> there is absolutely zero guarantee of event delivery when using this
>> source. For stronger reliability guarantees, consider the Spooling
>> Directory Source or direct integration with Flume via the SDK.
>>
>>
>>
>> On Wed, Oct 9, 2013 at 10:33 PM, Abhijeet Shipure <[EMAIL PROTECTED]
>> > wrote:
>>
>>> Hi,
>>>
>>> I am looking for Flume NG source that can be used for reading many files
>>> which are getting continuously updated.
>>> I trued Spool Dir source but it does not work if file to be read gets
>>> modified.
>>>
>>> Here is the scenario:
>>> 100 files are getting generated at one time and these files
>>> are continuously  updated for fixed interval say 5 mins, after 5 mins new
>>> 100 files get generated and being written again for 5 mins.
>>>
>>> Which flume source is most suitable and how it should be used
>>> effectively without any data loss.
>>>
>>> Any help is greatly appreciated.
>>>
>>>
>>> Thanks
>>>  Abhijeet Shipure
>>>
>>>
>>
>