People are welcome to complement but I guess the answer is :
1) Hadoop is not running on windows (I am not sure if Microsoft made any
statement about the OS used for Hadoop on Azure.)
2) files are written in one go with big blocks. (And actually, the files
fragmentation is not the only issue. The many small files 'issue' is -in
the end- a data fragmentation issue too and has an impact to read
On Tue, Nov 13, 2012 at 9:30 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:
> How does HDFS deal with optimization of file streaming? Do data nodes
> have any optimizations at the disk level for dealing with fragmented files?
> I assume not, but just curious if this is at all in the works, or if there
> are java-y ways of dealing with a long running set of files in an HDFS
> cluster. MAybe, for example, data nodes could log the amount of time spent
> on I/O for certain files as a way of reporting wether or not
> defragmentation needed to be run on a particular node in a cluster.
> Jay Vyas