Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - file channel read performance impacted by write rate


Copy link to this message
-
Re: file channel read performance impacted by write rate
Brock Noland 2013-12-17, 12:54
Can you take and share 8-10 thread dumps while the sink is taking events
"slowly"?

Can you share your machine and file channel configuration?
On Dec 17, 2013 6:28 AM, "Shangan Chen" <[EMAIL PROTECTED]> wrote:

> we face the same problem, performance of taking events from channel is a
> severe bottleneck. When there're less events in channel,  problem does not
> alleviate. following is a log of the metrics of writing to hdfs, writing to
> 5 files with a batchsize of 200000, take cost the most of the total time.
>
>
> 17 十二月 2013 18:49:28,056 INFO
>  [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:489)  -
> HdfsSink-TIME-STAT sink[sink_hdfs_b] writers[5] eventcount[200000]
> all[44513] take[38197] append[5647] sync[17] getFilenameTime[371]
>
>
>
>
>
> On Mon, Nov 25, 2013 at 4:46 PM, Jan Van Besien <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> Is anybody still looking into this question?
>>
>> Should I log it in jira such that somebody can look into it later?
>>
>> thanks,
>> Jan
>>
>>
>>
>> On 11/18/2013 11:28 AM, Jan Van Besien wrote:
>> > Hi,
>> >
>> > Sorry it took me a while to answer this. I compiled a small test case
>> > using only off the shelve flume components that shows what is going on.
>> >
>> > The setup is a single agent with http source, null sink and file
>> > channel. I am using the default configuration as much as possible.
>> >
>> > The test goes as follows:
>> >
>> > - start the agent without sink
>> > - run a script that sends http requests in multiple threads to the http
>> > source (the script simply calls the url
>> http://localhost:8080/?key=value
>> > over and over a gain, whereby value is a random string of 100 chars).
>> > - this script does about 100 requests per second on my machine. I leave
>> > it running for a while, such that the file channel contains about 20000
>> > events.
>> > - add the null sink to the configuration (around 11:14:33 in the log).
>> > - observe the logging of the null sink. You'll see in the log file that
>> > it takes more than 10 seconds per 1000 events (until about even 5000,
>> > around 11:15:33)
>> > - stop the http request generating script (i.e. no more writing in file
>> > channel)
>> > - observer the logging of the null sink: events 5000 until 20000 are all
>> > processed within a few seconds.
>> >
>> > In the attachment:
>> > - flume log
>> > - thread dumps while the ingest was running and the null sink was
>> enabled
>> > - config (agent1.conf)
>> >
>> > I also tried with more sinks (4), see agent2.conf. The results are the
>> same.
>> >
>> > Thanks for looking into this,
>> > Jan
>> >
>> >
>> > On 11/14/2013 05:08 PM, Brock Noland wrote:
>> >> On Thu, Nov 14, 2013 at 2:50 AM, Jan Van Besien <[EMAIL PROTECTED]
>> >> <mailto:[EMAIL PROTECTED]>> wrote:
>> >>
>> >>      On 11/13/2013 03:04 PM, Brock Noland wrote:
>> >>       > The file channel uses a WAL which sits on disk.  Each time an
>> >>      event is
>> >>       > committed an fsync is called to ensure that data is durable.
>> Without
>> >>       > this fsync there is no durability guarantee. More details here:
>> >>       > https://blogs.apache.org/flume/entry/apache_flume_filechannel
>> >>
>> >>      Yes indeed. I was just not expecting the performance impact to be
>> >>      that big.
>> >>
>> >>
>> >>       > The issue is that when the source is committing one-by-one it's
>> >>       > consuming the disk doing an fsync for each event.  I would
>> find a
>> >>      way to
>> >>       > batch up the requests so they are not written one-by-one or use
>> >>      multiple
>> >>       > disks for the file channel.
>> >>
>> >>      I am already using multiple disks for the channel (4).
>> >>
>> >>
>> >> Can you share your configuration?
>> >>
>> >>      Batching the
>> >>      requests is indeed what I am doing to prevent the filechannel to
>> be the
>> >>      bottleneck (using a flume agent with a memory channel in front of
>> the
>> >>      agent with the file channel), but it inheritely means that I loose