|
|
-
Re: Optimizing Disk I/O - does HDFS do anything ?Bertrand Dechoux 2012-11-13, 21:10
People are welcome to complement but I guess the answer is :
1) Hadoop is not running on windows (I am not sure if Microsoft made any statement about the OS used for Hadoop on Azure.) -> http://www.howtogeek.com/115229/htg-explains-why-linux-doesnt-need-defragmenting/ 2) files are written in one go with big blocks. (And actually, the files fragmentation is not the only issue. The many small files 'issue' is -in the end- a data fragmentation issue too and has an impact to read throughput.) Bertrand Dechoux On Tue, Nov 13, 2012 at 9:30 PM, Jay Vyas <[EMAIL PROTECTED]> wrote: > How does HDFS deal with optimization of file streaming? Do data nodes > have any optimizations at the disk level for dealing with fragmented files? > I assume not, but just curious if this is at all in the works, or if there > are java-y ways of dealing with a long running set of files in an HDFS > cluster. MAybe, for example, data nodes could log the amount of time spent > on I/O for certain files as a way of reporting wether or not > defragmentation needed to be run on a particular node in a cluster. > > -- > Jay Vyas > http://jayunit100.blogspot.com > |