Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> DFSOutputStream.sync() method latency time


Copy link to this message
-
Re: DFSOutputStream.sync() method latency time
Thanks Yanbo for your reply.

I  test code are :
        FSDataOutputStream outputStream = fs.create(path);
        Random r = new Random();
        long totalBytes = 0;
        String str =  new String(new byte[1024]);
        while(totalBytes < 1024 * 1024 * 500) {
          byte[] bytes = ("start_"+r.nextLong() +"_" + str +
r.nextLong()+"_end" + "\n").getBytes();
          outputStream.write(bytes);
          outputStream.sync();
          totalBytes = totalBytes + bytes.length;
        }
        outputStream.close();
The write method and sync method is synchronized, so the two method is not
cocurrent.

The write method write data to memory of client, the sync method send
package to pipelien,  client can execute write  method  until the  sync
method return sucess,  so I  think the sync method latency time should be
equal with superposition of each datanode operation.
2013/3/28 Yanbo Liang <[EMAIL PROTECTED]>

> 1st when client wants to write data to HDFS, it should be create
> DFSOutputStream.
> Then the client write data to this output stream and this stream will
> transfer data to all DataNodes with the constructed pipeline by the means
> of Packet whose size is 64KB.
> These two operations is concurrent, so the write latency is not simple
> superposition.
>
> 2nd the sync method only flush the last packet ( at most 64KB ) data to
> the pipeline.
>
> Because of the cocurrent processing of all these operations, so the
> latency is smaller than the superposition of each operation.
> It's parallel computing rather than serial computing in a sense.
>
>
> 2013/3/28 lei liu <[EMAIL PROTECTED]>
>
>> When client  write data, if there are three replicates,  the sync method
>> latency time formula should be:
>> sync method  latency time = first datanode receive data time + sencond
>> datanode receive data  time +  third datanode receive data time.
>>
>> if the three datanode receive data time all are 2 millisecond, so the
>> sync method  latency time should is 6 millisecond,  but according to our
>> our monitor, the the sync method  latency time is 2 millisecond.
>>
>>
>> How to calculate sync method  latency time?
>>
>>
>> Thanks,
>>
>> LiuLei
>>
>>
>