|
|
-
kafka file persistance blocks, ramifications on payload sizes and flush timings
S Ahmed 2012-05-09, 15:19
On the dev list I was curious as to how kafka persists the in-memory to disk, and Jay responded with:
"filechannel.force() always fully syncs the file to disk. This is done irrespective of message boundaries. The file is locked during this time so other appends are blocked."
So doing a little match, if my payload sizes are 20KB, and I flush once there is 10K items, that means:
10 000 x 20 480 bytes = 195.3125 megabytes
What I am curious is, how long does this flush to disk take, and are there any built it metrics/logging that I can measure the average time it takes to write the in-memory to disk? Or what about the time a producer is blocked during a flush to disk?
* *
+
S Ahmed 2012-05-09, 15:19
-
Re: kafka file persistance blocks, ramifications on payload sizes and flush timings
Edward Smith 2012-05-09, 15:32
You can benchmark your file I/O system to see how long that write should take.
dd if=/dev/zero of=/path/on/your/storage/device bs=20k count=10000
Of course you have to worry about what else might be writing to that device to compete for file I/O.
On Wed, May 9, 2012 at 11:19 AM, S Ahmed <[EMAIL PROTECTED]> wrote: > On the dev list I was curious as to how kafka persists the in-memory to > disk, and Jay responded with: > > "filechannel.force() always fully syncs the file to disk. This is done > irrespective of message boundaries. The file is locked during this > time so other appends are blocked." > > So doing a little match, if my payload sizes are 20KB, and I flush once > there is 10K items, that means: > > 10 000 x 20 480 bytes = 195.3125 megabytes > > What I am curious is, how long does this flush to disk take, and are there > any built it metrics/logging that I can measure the average time it takes > to write the in-memory to disk? Or what about the time a producer is > blocked during a flush to disk? > > > > * > *
+
Edward Smith 2012-05-09, 15:32
-
Re: kafka file persistance blocks, ramifications on payload sizes and flush timings
Jun Rao 2012-05-09, 17:11
There is debug level logging in FileMessageSet that tells you the time taken for each flush. We also have jmx beans that report avg, max flush time.
Jun
On Wed, May 9, 2012 at 8:19 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
> On the dev list I was curious as to how kafka persists the in-memory to > disk, and Jay responded with: > > "filechannel.force() always fully syncs the file to disk. This is done > irrespective of message boundaries. The file is locked during this > time so other appends are blocked." > > So doing a little match, if my payload sizes are 20KB, and I flush once > there is 10K items, that means: > > 10 000 x 20 480 bytes = 195.3125 megabytes > > What I am curious is, how long does this flush to disk take, and are there > any built it metrics/logging that I can measure the average time it takes > to write the in-memory to disk? Or what about the time a producer is > blocked during a flush to disk? > > > > * > * >
+
Jun Rao 2012-05-09, 17:11
|
|