Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> picking up new files in Flume NG


Copy link to this message
-
Re: picking up new files in Flume NG
Hey Sadu, your use case is exactly what I'm writing this for. I'm
hoping this patch will get committed within a few days, we're on a
last rev of reviews.

- Patrick

On Tue, Oct 16, 2012 at 10:47 AM, Brock Noland <[EMAIL PROTECTED]> wrote:
> Correct, it's only available in that patch, from the RB it looks like
> it's not too far off from being committed.
>
> Brock
>
> On Tue, Oct 16, 2012 at 12:00 PM, Sadananda Hegde <[EMAIL PROTECTED]> wrote:
>> Yes, It is very similar.
>>
>> The spool directory will keep getting new files. We need to scan through the
>> directory, send the data in the existing files to HDFS , cleanup the files
>> (delete / move/ rename, etc) and scan for new files again. The Spooldir
>> source is not available yet, right?
>>
>> Thanks,
>> Sadu
>>
>>
>> On Tue, Oct 16, 2012 at 10:11 AM, Brock Noland <[EMAIL PROTECTED]> wrote:
>>>
>>> Sounds like https://issues.apache.org/jira/browse/FlUME-1425  ?
>>>
>>> Brock
>>>
>>> On Mon, Oct 15, 2012 at 11:37 PM, Sadananda Hegde <[EMAIL PROTECTED]>
>>> wrote:
>>> > Hello,
>>> >
>>> > I have a scenario where in the client application is continuously
>>> > pushing
>>> > xml messages. Actually the application is writing these messages to
>>> > files
>>> > (new files; same directory). So we will be keep getting new files
>>> > throughout
>>> > the day. I am trying to configure Flume agents on these applcation
>>> > servers
>>> > (4 of them) to pick up the new data and transfer them to HDFS on a
>>> > hadoop
>>> > cluster. How should I configure my source to pick up new files (and
>>> > exclude
>>> > the files that have been processed already)? I don't think Exec source
>>> > with
>>> > tail  -F will work in this scenario because data is not getting added to
>>> > existing files; rather new files get created.
>>> >
>>> > Thank you very much for your time and support.
>>> >
>>> > Sadu
>>>
>>>
>>>
>>> --
>>> Apache MRUnit - Unit testing MapReduce -
>>> http://incubator.apache.org/mrunit/
>>
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB