Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - FileChannel data directory usage


Copy link to this message
-
Re: FileChannel data directory usage
Hari Shreedharan 2012-08-23, 07:52
The File Channel is implemented as a Write Ahead Log. The channel keeps a reference count of the number of events in a particular data file which needs to be taken by the sink. Once all the events in a file are taken, the file will be deleted after the next checkpoint. If you want the files to get deleted faster you can reduce the maximum size of a data file through the config parameter "maxFileSize"(this is the maximum size you want each individual log file to grow to - in bytes). By default, the maxFileSize is around 1.5G.
As an experiment you can reduce the file size and see if the files are getting deleted(each directory will have at least 2 files even if all events have been taken). If not, please let us know.
Thanks,
Hari

--
Hari Shreedharan
On Thursday, August 23, 2012 at 12:30 AM, KARASZI Istvan wrote:

> Hey All!
>
> I'm writing here because we're facing a problem at our flume-ng installation (version 1.2.0). We're collecting logs from Exec Sources to File Channel and HDFS Sinks (in a load balanced sink groups).
>
> Everything looks fine, the logs are arriving on the HDFS except one thing. In the selected dataDir the log files are growing on every agent. Is it a normal behavior and should I take care of their deletion or the agent will do it or is it a bug?
>
> Thanks,
> --
> KARASZI Istvan