Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - How to read multiples files getting continuously updated


+
Abhijeet Shipure 2013-10-10, 05:33
+
Steve Morin 2013-10-10, 05:52
+
Abhijeet Shipure 2013-10-10, 06:09
+
Steve Morin 2013-10-10, 06:11
+
Abhijeet Shipure 2013-10-10, 06:27
Copy link to this message
-
Re: How to read multiples files getting continuously updated
Steve Morin 2013-10-10, 06:48
I think that would be the best option
On Wed, Oct 9, 2013 at 11:27 PM, Abhijeet Shipure <[EMAIL PROTECTED]>wrote:

> Yes new files are created at fixed interval but write time is not fixed
> and files written as and when request comes.
> I was thinking of creating utility to copy files to new directory and use
> Spool Dir source.
>
> Regards
> Abhijeet
>
>
>
> On Thu, Oct 10, 2013 at 11:41 AM, Steve Morin <[EMAIL PROTECTED]>wrote:
>
>> If the files are continually written to I don't think there is a good
>> option.  Can new files be written to every time interval?
>>
>>
>> On Wed, Oct 9, 2013 at 11:09 PM, Abhijeet Shipure <[EMAIL PROTECTED]
>> > wrote:
>>
>>> Hi Steve,
>>>
>>> Thanks for quick reply, as you pointed out Exec Source does not provide
>>> reliability, which is required in my case, and hence it is not suitable.
>>>
>>> So which other inbuilt source could be used to read from many files ?
>>> Just one other requirement is file name s are also dynamically generated
>>> using time stamp after every 5 mins.
>>>
>>>
>>> Regards
>>> Abhijeet
>>>
>>>
>>> On Thu, Oct 10, 2013 at 11:22 AM, Steve Morin <[EMAIL PROTECTED]>wrote:
>>>
>>>> If your read the Flume manual it doesn't support a tail source
>>>>
>>>> http://flume.apache.org/FlumeUserGuide.html#exec-source
>>>>
>>>> Warning
>>>>
>>>>
>>>> The problem with ExecSource and other asynchronous sources is that the
>>>> source can not guarantee that if there is a failure to put the event into
>>>> the Channel the client knows about it. In such cases, the data will be
>>>> lost. As a for instance, one of the most commonly requested features is the
>>>> tail -F [file]-like use case where an application writes to a log file
>>>> on disk and Flume tails the file, sending each line as an event. While this
>>>> is possible, there’s an obvious problem; what happens if the channel fills
>>>> up and Flume can’t send an event? Flume has no way of indicating to the
>>>> application writing the log file that it needs to retain the log or that
>>>> the event hasn’t been sent, for some reason. If this doesn’t make sense,
>>>> you need only know this: Your application can never guarantee data has been
>>>> received when using a unidirectional asynchronous interface such as
>>>> ExecSource! As an extension of this warning - and to be completely clear -
>>>> there is absolutely zero guarantee of event delivery when using this
>>>> source. For stronger reliability guarantees, consider the Spooling
>>>> Directory Source or direct integration with Flume via the SDK.
>>>>
>>>>
>>>>
>>>> On Wed, Oct 9, 2013 at 10:33 PM, Abhijeet Shipure <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am looking for Flume NG source that can be used for reading many
>>>>> files which are getting continuously updated.
>>>>> I trued Spool Dir source but it does not work if file to be read gets
>>>>> modified.
>>>>>
>>>>> Here is the scenario:
>>>>> 100 files are getting generated at one time and these files
>>>>> are continuously  updated for fixed interval say 5 mins, after 5 mins new
>>>>> 100 files get generated and being written again for 5 mins.
>>>>>
>>>>> Which flume source is most suitable and how it should be used
>>>>> effectively without any data loss.
>>>>>
>>>>> Any help is greatly appreciated.
>>>>>
>>>>>
>>>>> Thanks
>>>>>  Abhijeet Shipure
>>>>>
>>>>>
>>>>
>>>
>>
>
+
DSuiter RDX 2013-10-10, 11:46
+
Paul Chavez 2013-10-10, 16:04