Jagadish Bihani 2012-10-22, 11:48
Denny Ye 2012-10-22, 13:38
Jagadish Bihani 2012-10-23, 06:31
Without the fsync guarrantees are weakened a lot more than the fsync
Also, you didn't mention the batch size on your avro sink that is
sending data to the avro-source. This is a major factor on your
throughput because each batch causes one sync. If you have big batches,
you'll have few fsyncs and significantly better performance.
I am weirded out by the fact that Danny is getting improved performance
by running multiple parallel file sinks... Are they each on separate
disks or something? I can't imagine what could cause a performance gain
if they were all on the same disk. Would likely expect more write head
skipping around and degradation even...
On 10/23/2012 03:31 PM, Jagadish Bihani wrote:
> Hi Denny
> Thanks for the inputs.
> Btw when you say you tested another case without 'fsync'; I think
> you changed the file channel code to comment out 'flush' part of it.
> And if we rely on OS flushing then still it can be reasonably reliable.
> Is that right?
> On 10/22/2012 07:08 PM, Denny Ye wrote:
>> hi Jagadish,
>> I have tested performance of FileChannel recently. Here I can
>> support the test report to you for your thinking and questions at
>> this thread.
>> Talking about the comparison between FileChannel and File Sink.
>> FileChannel supports both sequential writer and random reader, there
>> have so many times shift of magnetic head, it's slow than the
>> sequential writing much more.
>> 'fsync' command has consuming much time than writing, almost
>> 100times/sec, same as number mentioned from Brock. Also, I didn't
>> know why there have such difference between your two servers. I think
>> it might be related with OS version (usage between fsync and
>> fdatasync instruction) or disk driver (RAID, caching strategy, and so
>> Throughput of single FileChannel is almost 3-5MB/sec in my
>> environment. Thus I used 5 channels with 18MB/sec. It's hard to
>> believe the linear increasing with more channels. Meanwhile, it look
>> like the limit of throughput with 'fsync' operation. I tested another
>> case without 'fsync' operation after each batch, almost
>> 35-40MB/sec(Also, I removed the pre-allocation at disk writing in
>> this case).
>> Hope useful for you.
>> PS : I heard that OS has demon thread to flush page cache to
>> disk asynchronously with second latency, does it's effective for
>> amount of data with tolerant loss?
>> Denny Ye
>> 2012/10/22 Jagadish Bihani <[EMAIL PROTECTED]
>> <mailto:[EMAIL PROTECTED]>>
>> I am writing this on top of another thread where there was
>> discussion on "fsync lies" and
>> only file channel used fsync and not file sink. :
>> -- I tested the fsync performance on 2 machines (On 1 machine I
>> was getting very good throughput
>> using file channel and on another almost 100 times slower with
>> almost same hardware configuration.)
>> using following code
>> #define PAGESIZE 4096
>> int main(int argc, char *argv)
>> char my_write_str[PAGESIZE];
>> char my_read_str[PAGESIZE];
>> char *read_filename= argv;
>> int readfd,writefd;
>> readfd = open(read_filename,O_RDONLY);
>> writefd = open("written_file",O_WRONLY|O_CREAT,777);
>> int len=lseek(readfd,0,2);
>> int iterations = len/PAGESIZE;
>> int i;
>> struct timeval t0,t1;
>> ** fsync(writefd);**
>> ** gettimeofday(&t1,0);*
>> long elapsed = (t1.tv_sec-t0.tv_sec)*1000000 +
Jagadish Bihani 2012-10-22, 13:18
Brock Noland 2012-10-22, 13:59
Brock Noland 2012-10-22, 14:29
Jagadish Bihani 2012-10-23, 06:40
Juhani Connolly 2012-10-23, 07:26