Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> file channel read performance impacted by write rate


Copy link to this message
-
Re: file channel read performance impacted by write rate
On 11/13/2013 03:04 PM, Brock Noland wrote:
> The file channel uses a WAL which sits on disk.  Each time an event is
> committed an fsync is called to ensure that data is durable. Without
> this fsync there is no durability guarantee. More details here:
> https://blogs.apache.org/flume/entry/apache_flume_filechannel

Yes indeed. I was just not expecting the performance impact to be that big.

> The issue is that when the source is committing one-by-one it's
> consuming the disk doing an fsync for each event.  I would find a way to
> batch up the requests so they are not written one-by-one or use multiple
> disks for the file channel.

I am already using multiple disks for the channel (4). Batching the
requests is indeed what I am doing to prevent the filechannel to be the
bottleneck (using a flume agent with a memory channel in front of the
agent with the file channel), but it inheritely means that I loose
end-to-end durability because events are buffered in memory before being
flushed to disk.

thanks,
Jan

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB