Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Optimizing Disk I/O - does HDFS do anything ?


+
Jay Vyas 2012-11-13, 20:30
+
Bertrand Dechoux 2012-11-13, 21:10
+
Scott Carey 2012-11-17, 07:27
Copy link to this message
-
Re: Optimizing Disk I/O - does HDFS do anything ?
hmmm...

1) but I thought that this sort of thing (yes even on linux) becomes
important when you have large amounts of data - because the way files are
written can cause issues on highly packed drives.

2) Probably this is the key point: HDFS i/o is most effected by the file
size, which is much more important than any occasional minor disk
inhomogeneities.  So - the focus is on distributing and replicating files
rather than microoptimizing individual files.
On Tue, Nov 13, 2012 at 4:10 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:

> People are welcome to complement but I guess the answer is :
> 1) Hadoop is not running on windows (I am not sure if Microsoft made any
> statement about the OS used for Hadoop on Azure.)
> ->
> http://www.howtogeek.com/115229/htg-explains-why-linux-doesnt-need-defragmenting/
> 2) files are written in one go with big blocks. (And actually, the files
> fragmentation is not the only issue. The many small files 'issue' is -in
> the end- a data fragmentation issue too and has an impact to read
> throughput.)
>
> Bertrand Dechoux
>
>
> On Tue, Nov 13, 2012 at 9:30 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:
>
>> How does HDFS deal with optimization of file streaming?  Do data nodes
>> have any optimizations at the disk level for dealing with fragmented files?
>>  I assume not, but just curious if this is at all in the works, or if there
>> are java-y ways of dealing with a long running set of files in an HDFS
>> cluster.  MAybe, for example, data nodes could log the amount of time spent
>> on I/O for certain files as a way of reporting wether or not
>> defragmentation needed to be run on  a particular node in a cluster.
>>
>> --
>> Jay Vyas
>> http://jayunit100.blogspot.com
>>
>
>
--
Jay Vyas
http://jayunit100.blogspot.com
+
Andy Isaacson 2012-11-13, 21:53
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB