Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Optimizing Disk I/O - does HDFS do anything ?


Copy link to this message
-
Re: Optimizing Disk I/O - does HDFS do anything ?
People are welcome to complement but I guess the answer is :
1) Hadoop is not running on windows (I am not sure if Microsoft made any
statement about the OS used for Hadoop on Azure.)
->
http://www.howtogeek.com/115229/htg-explains-why-linux-doesnt-need-defragmenting/
2) files are written in one go with big blocks. (And actually, the files
fragmentation is not the only issue. The many small files 'issue' is -in
the end- a data fragmentation issue too and has an impact to read
throughput.)

Bertrand Dechoux

On Tue, Nov 13, 2012 at 9:30 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:

> How does HDFS deal with optimization of file streaming?  Do data nodes
> have any optimizations at the disk level for dealing with fragmented files?
>  I assume not, but just curious if this is at all in the works, or if there
> are java-y ways of dealing with a long running set of files in an HDFS
> cluster.  MAybe, for example, data nodes could log the amount of time spent
> on I/O for certain files as a way of reporting wether or not
> defragmentation needed to be run on  a particular node in a cluster.
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB