|
|
-
Re: Optimizing Disk I/O - does HDFS do anything ?Jay Vyas 2012-11-13, 21:40
hmmm...
1) but I thought that this sort of thing (yes even on linux) becomes important when you have large amounts of data - because the way files are written can cause issues on highly packed drives. 2) Probably this is the key point: HDFS i/o is most effected by the file size, which is much more important than any occasional minor disk inhomogeneities. So - the focus is on distributing and replicating files rather than microoptimizing individual files. On Tue, Nov 13, 2012 at 4:10 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote: > People are welcome to complement but I guess the answer is : > 1) Hadoop is not running on windows (I am not sure if Microsoft made any > statement about the OS used for Hadoop on Azure.) > -> > http://www.howtogeek.com/115229/htg-explains-why-linux-doesnt-need-defragmenting/ > 2) files are written in one go with big blocks. (And actually, the files > fragmentation is not the only issue. The many small files 'issue' is -in > the end- a data fragmentation issue too and has an impact to read > throughput.) > > Bertrand Dechoux > > > On Tue, Nov 13, 2012 at 9:30 PM, Jay Vyas <[EMAIL PROTECTED]> wrote: > >> How does HDFS deal with optimization of file streaming? Do data nodes >> have any optimizations at the disk level for dealing with fragmented files? >> I assume not, but just curious if this is at all in the works, or if there >> are java-y ways of dealing with a long running set of files in an HDFS >> cluster. MAybe, for example, data nodes could log the amount of time spent >> on I/O for certain files as a way of reporting wether or not >> defragmentation needed to be run on a particular node in a cluster. >> >> -- >> Jay Vyas >> http://jayunit100.blogspot.com >> > > -- Jay Vyas http://jayunit100.blogspot.com |