|
|
-
FileChannel data directory usage
KARASZI Istvan 2012-08-23, 07:30
Hey All!
I'm writing here because we're facing a problem at our flume-ng installation (version 1.2.0). We're collecting logs from Exec Sources to File Channel and HDFS Sinks (in a load balanced sink groups).
Everything looks fine, the logs are arriving on the HDFS except one thing. In the selected dataDir the log files are growing on every agent. Is it a normal behavior and should I take care of their deletion or the agent will do it or is it a bug?
Thanks, -- KARASZI Istvan
-
Re: FileChannel data directory usage
Hari Shreedharan 2012-08-23, 07:52
The File Channel is implemented as a Write Ahead Log. The channel keeps a reference count of the number of events in a particular data file which needs to be taken by the sink. Once all the events in a file are taken, the file will be deleted after the next checkpoint. If you want the files to get deleted faster you can reduce the maximum size of a data file through the config parameter "maxFileSize"(this is the maximum size you want each individual log file to grow to - in bytes). By default, the maxFileSize is around 1.5G. As an experiment you can reduce the file size and see if the files are getting deleted(each directory will have at least 2 files even if all events have been taken). If not, please let us know. Thanks, Hari
-- Hari Shreedharan On Thursday, August 23, 2012 at 12:30 AM, KARASZI Istvan wrote:
> Hey All! > > I'm writing here because we're facing a problem at our flume-ng installation (version 1.2.0). We're collecting logs from Exec Sources to File Channel and HDFS Sinks (in a load balanced sink groups). > > Everything looks fine, the logs are arriving on the HDFS except one thing. In the selected dataDir the log files are growing on every agent. Is it a normal behavior and should I take care of their deletion or the agent will do it or is it a bug? > > Thanks, > -- > KARASZI Istvan
-
Re: FileChannel data directory usage
KARASZI Istvan 2012-08-23, 08:06
On Aug 23, 2012, at 9:52 AM, Hari Shreedharan wrote: > The File Channel is implemented as a Write Ahead Log. The channel keeps a reference count of the number of events in a particular data file which needs to be taken by the sink. Once all the events in a file are taken, the file will be deleted after the next checkpoint. If you want the files to get deleted faster you can reduce the maximum size of a data file through the config parameter "maxFileSize"(this is the maximum size you want each individual log file to grow to - in bytes). By default, the maxFileSize is around 1.5G. Ohh, great thanks. It was not clear for me from the documentation, sorry. I'll check it few days later then.
> As an experiment you can reduce the file size and see if the files are getting deleted(each directory will have at least 2 files even if all events have been taken). If not, please let us know. Yes, there are two log files in that directory right now. I'll write back in a few days.
Thanks again! -- KARASZI Istvan
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext