Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Re: file channel read performance impacted by write rate


+
Jan Van Besien 2013-11-14, 08:50
+
Brock Noland 2013-11-14, 16:07
+
Jan Van Besien 2013-11-13, 09:32
+
Brock Noland 2013-11-13, 14:03
+
Jan Van Besien 2013-11-18, 10:28
+
Brock Noland 2013-12-17, 17:51
+
Jan Van Besien 2013-11-18, 13:21
Copy link to this message
-
Re: file channel read performance impacted by write rate
Hi,

Is anybody still looking into this question?

Should I log it in jira such that somebody can look into it later?

thanks,
Jan

On 11/18/2013 11:28 AM, Jan Van Besien wrote:
> Hi,
>
> Sorry it took me a while to answer this. I compiled a small test case
> using only off the shelve flume components that shows what is going on.
>
> The setup is a single agent with http source, null sink and file
> channel. I am using the default configuration as much as possible.
>
> The test goes as follows:
>
> - start the agent without sink
> - run a script that sends http requests in multiple threads to the http
> source (the script simply calls the url http://localhost:8080/?key=value
> over and over a gain, whereby value is a random string of 100 chars).
> - this script does about 100 requests per second on my machine. I leave
> it running for a while, such that the file channel contains about 20000
> events.
> - add the null sink to the configuration (around 11:14:33 in the log).
> - observe the logging of the null sink. You'll see in the log file that
> it takes more than 10 seconds per 1000 events (until about even 5000,
> around 11:15:33)
> - stop the http request generating script (i.e. no more writing in file
> channel)
> - observer the logging of the null sink: events 5000 until 20000 are all
> processed within a few seconds.
>
> In the attachment:
> - flume log
> - thread dumps while the ingest was running and the null sink was enabled
> - config (agent1.conf)
>
> I also tried with more sinks (4), see agent2.conf. The results are the same.
>
> Thanks for looking into this,
> Jan
>
>
> On 11/14/2013 05:08 PM, Brock Noland wrote:
>> On Thu, Nov 14, 2013 at 2:50 AM, Jan Van Besien <[EMAIL PROTECTED]
>> <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>      On 11/13/2013 03:04 PM, Brock Noland wrote:
>>       > The file channel uses a WAL which sits on disk.  Each time an
>>      event is
>>       > committed an fsync is called to ensure that data is durable. Without
>>       > this fsync there is no durability guarantee. More details here:
>>       > https://blogs.apache.org/flume/entry/apache_flume_filechannel
>>
>>      Yes indeed. I was just not expecting the performance impact to be
>>      that big.
>>
>>
>>       > The issue is that when the source is committing one-by-one it's
>>       > consuming the disk doing an fsync for each event.  I would find a
>>      way to
>>       > batch up the requests so they are not written one-by-one or use
>>      multiple
>>       > disks for the file channel.
>>
>>      I am already using multiple disks for the channel (4).
>>
>>
>> Can you share your configuration?
>>
>>      Batching the
>>      requests is indeed what I am doing to prevent the filechannel to be the
>>      bottleneck (using a flume agent with a memory channel in front of the
>>      agent with the file channel), but it inheritely means that I loose
>>      end-to-end durability because events are buffered in memory before being
>>      flushed to disk.
>>
>>
>> I would be curious to know though if you doubled the sinks if that would
>> give more time to readers. Could you take three-four thread dumps of the
>> JVM while it's in this state and share them?
>>
>

+
Shangan Chen 2013-12-17, 12:27
+
Brock Noland 2013-12-17, 12:54
+
Shangan Chen 2013-12-17, 15:32
+
Brock Noland 2013-12-17, 16:13
+
Hari Shreedharan 2013-12-17, 17:11
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB