Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Review Request: Low throughput of FileChannel


Copy link to this message
-
Re: Review Request: Low throughput of FileChannel


> On Aug. 3, 2012, 9:29 a.m., Mike Percy wrote:
> > Wow you have been busy, that is really great! One problem with this change however is that FileChannel is no longer guaranteed to be durable. Many users cannot accept that limitation, especially after a Flume 1.2.0 release with a durable File Channel. Why not just use MemoryChannel if you don't need crash durability?
>
> Denny Ye wrote:
>     Flume doesn't guarantee 100% delivery. The difference between using 'force' and not using is distinguish with process or server crash. The possible or process crash is most frequently than server crash. Even we use 'force' method in transaction commit each time, we always cannot keep reliable delivery. MemoryChannel may cause too many full gc, I'm tracking gc issue now. Maybe I will use DirectByteBuffer to reduce lots of   events in heap.

Flume delivery guarrantees are dependent on the channel used. To  date FileChannel has been pushed as a durable/lossless channel. Perhaps we should have separate durable and performant file channels.

What scenario is it that you believe will result in data loss with the current model?
- Juhani
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6329/#review9814
-----------------------------------------------------------
On Aug. 3, 2012, 9:39 a.m., Denny Ye wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6329/
> -----------------------------------------------------------
>
> (Updated Aug. 3, 2012, 9:39 a.m.)
>
>
> Review request for Flume, Hari Shreedharan and Patrick Wendell.
>
>
> Description
> -------
>
> Here is the description in code changes
> 1. Remove the 'FileChannel.force(false)'. Each commit from Source will invoke this 'force' method. This method is too heavy for amounts of data comes. Each 'force' action will be consume 50-500ms that it confirms data stored into disk. Normally, OS will flush data from kernal buffer to disk asynchronously with ms level latency. It may useless in each commit operation. Certainly, data loss may occurs in server crash not process crash. Server crash is infrequent.
> 2. Do not pre-allocate disk space. Disk doesn't need the pre-allocation.
> 3. Use 'RandomAccessFile.write()' to replace 'FileChannel.write()'. Both in my test result and low-level instruction, the former is better than the latter
>
> Here I posted three changes, and I would like to use thread-level cached DirectByteBuffer to replace inner-heap ByteBuffer.allocate() (reuse outer-heap memory to reduce time that copying from heap to kernal). I will test this changes in next phase.
>
> After tuning, throughput increasing from 5MB to 30MB
>
>
> This addresses bug FLUME-1423.
>     https://issues.apache.org/jira/browse/FLUME-1423
>
>
> Diffs
> -----
>
>   trunk/flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/LogFile.java 1363210
>
> Diff: https://reviews.apache.org/r/6329/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Denny Ye
>
>