Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Optimizing Disk I/O - does HDFS do anything ?

Copy link to this message
Re: Optimizing Disk I/O - does HDFS do anything ?

1) but I thought that this sort of thing (yes even on linux) becomes
important when you have large amounts of data - because the way files are
written can cause issues on highly packed drives.

2) Probably this is the key point: HDFS i/o is most effected by the file
size, which is much more important than any occasional minor disk
inhomogeneities.  So - the focus is on distributing and replicating files
rather than microoptimizing individual files.
On Tue, Nov 13, 2012 at 4:10 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:

> People are welcome to complement but I guess the answer is :
> 1) Hadoop is not running on windows (I am not sure if Microsoft made any
> statement about the OS used for Hadoop on Azure.)
> ->
> http://www.howtogeek.com/115229/htg-explains-why-linux-doesnt-need-defragmenting/
> 2) files are written in one go with big blocks. (And actually, the files
> fragmentation is not the only issue. The many small files 'issue' is -in
> the end- a data fragmentation issue too and has an impact to read
> throughput.)
> Bertrand Dechoux
> On Tue, Nov 13, 2012 at 9:30 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:
>> How does HDFS deal with optimization of file streaming?  Do data nodes
>> have any optimizations at the disk level for dealing with fragmented files?
>>  I assume not, but just curious if this is at all in the works, or if there
>> are java-y ways of dealing with a long running set of files in an HDFS
>> cluster.  MAybe, for example, data nodes could log the amount of time spent
>> on I/O for certain files as a way of reporting wether or not
>> defragmentation needed to be run on  a particular node in a cluster.
>> --
>> Jay Vyas
>> http://jayunit100.blogspot.com
Jay Vyas