Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> How to read multiples files getting continuously updated


+
Abhijeet Shipure 2013-10-10, 05:33
+
Steve Morin 2013-10-10, 05:52
Copy link to this message
-
Re: How to read multiples files getting continuously updated
Hi Steve,

Thanks for quick reply, as you pointed out Exec Source does not provide
reliability, which is required in my case, and hence it is not suitable.

So which other inbuilt source could be used to read from many files ? Just
one other requirement is file name s are also dynamically generated using
time stamp after every 5 mins.
Regards
Abhijeet
On Thu, Oct 10, 2013 at 11:22 AM, Steve Morin <[EMAIL PROTECTED]> wrote:

> If your read the Flume manual it doesn't support a tail source
>
> http://flume.apache.org/FlumeUserGuide.html#exec-source
>
> Warning
>
>
> The problem with ExecSource and other asynchronous sources is that the
> source can not guarantee that if there is a failure to put the event into
> the Channel the client knows about it. In such cases, the data will be
> lost. As a for instance, one of the most commonly requested features is the
> tail -F [file]-like use case where an application writes to a log file on
> disk and Flume tails the file, sending each line as an event. While this is
> possible, there’s an obvious problem; what happens if the channel fills up
> and Flume can’t send an event? Flume has no way of indicating to the
> application writing the log file that it needs to retain the log or that
> the event hasn’t been sent, for some reason. If this doesn’t make sense,
> you need only know this: Your application can never guarantee data has been
> received when using a unidirectional asynchronous interface such as
> ExecSource! As an extension of this warning - and to be completely clear -
> there is absolutely zero guarantee of event delivery when using this
> source. For stronger reliability guarantees, consider the Spooling
> Directory Source or direct integration with Flume via the SDK.
>
>
>
> On Wed, Oct 9, 2013 at 10:33 PM, Abhijeet Shipure <[EMAIL PROTECTED]>wrote:
>
>> Hi,
>>
>> I am looking for Flume NG source that can be used for reading many files
>> which are getting continuously updated.
>> I trued Spool Dir source but it does not work if file to be read gets
>> modified.
>>
>> Here is the scenario:
>> 100 files are getting generated at one time and these files
>> are continuously  updated for fixed interval say 5 mins, after 5 mins new
>> 100 files get generated and being written again for 5 mins.
>>
>> Which flume source is most suitable and how it should be used effectively
>> without any data loss.
>>
>> Any help is greatly appreciated.
>>
>>
>> Thanks
>>  Abhijeet Shipure
>>
>>
>
+
Steve Morin 2013-10-10, 06:11
+
Abhijeet Shipure 2013-10-10, 06:27
+
Steve Morin 2013-10-10, 06:48
+
DSuiter RDX 2013-10-10, 11:46
+
Paul Chavez 2013-10-10, 16:04
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB