Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # dev >> Review Request: Low throughput of FileChannel


+
Denny Ye 2012-08-03, 06:50
+
Mike Percy 2012-08-03, 09:29
+
Denny Ye 2012-08-03, 09:48
+
Juhani Connolly 2012-08-03, 10:18
+
Denny Ye 2012-08-03, 09:39
Copy link to this message
-
Re: Review Request: Low throughput of FileChannel

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6329/#review9819
-----------------------------------------------------------
Thanks Denny. IMO this patch defeats the purpose of 'durable' file channel. A reason an user might choose file-channel over memory-channel is to avoid data loss and JVM/server crash can happen due to various reasons such as  JVM bug, hardware problems and it is very hard to determine the frequency of crash. I would try ByteBuffer.allocateDirect() but there may be a chance of OOM as it depends on finalizer run [1] and we need to manually clean DirectByteBuffer using reflection.

[1] http://stackoverflow.com/questions/1854398/how-to-garbage-collect-a-direct-buffer-java

- Mubarak Seyed
On Aug. 3, 2012, 9:39 a.m., Denny Ye wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6329/
> -----------------------------------------------------------
>
> (Updated Aug. 3, 2012, 9:39 a.m.)
>
>
> Review request for Flume, Hari Shreedharan and Patrick Wendell.
>
>
> Description
> -------
>
> Here is the description in code changes
> 1. Remove the 'FileChannel.force(false)'. Each commit from Source will invoke this 'force' method. This method is too heavy for amounts of data comes. Each 'force' action will be consume 50-500ms that it confirms data stored into disk. Normally, OS will flush data from kernal buffer to disk asynchronously with ms level latency. It may useless in each commit operation. Certainly, data loss may occurs in server crash not process crash. Server crash is infrequent.
> 2. Do not pre-allocate disk space. Disk doesn't need the pre-allocation.
> 3. Use 'RandomAccessFile.write()' to replace 'FileChannel.write()'. Both in my test result and low-level instruction, the former is better than the latter
>
> Here I posted three changes, and I would like to use thread-level cached DirectByteBuffer to replace inner-heap ByteBuffer.allocate() (reuse outer-heap memory to reduce time that copying from heap to kernal). I will test this changes in next phase.
>
> After tuning, throughput increasing from 5MB to 30MB
>
>
> This addresses bug FLUME-1423.
>     https://issues.apache.org/jira/browse/FLUME-1423
>
>
> Diffs
> -----
>
>   trunk/flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/LogFile.java 1363210
>
> Diff: https://reviews.apache.org/r/6329/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Denny Ye
>
>

+
Denny Ye 2012-08-03, 10:55
+
Brock Noland 2012-08-03, 12:38
+
Patrick Wendell 2012-08-03, 15:29
+
Jarek Cecho 2012-08-03, 17:20
+
Hari Shreedharan 2012-08-03, 18:08