On Tue, Nov 13, 2012 at 1:40 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:
> 1) but I thought that this sort of thing (yes even on linux) becomes
> important when you have large amounts of data - because the way files are
> written can cause issues on highly packed drives.
If you're running any filesystem at 99% full with a workload that
creates or grows files, the filesystem will experience fragmentation.
Don't do that if you want good performance.
As long as there's a few dozen GB of free space to work with, ext4 on
a modern Linux kernel (2.6.38 or newer) will do a fine job of keeping
files sequential and shouldn't need defrag.
To answer the original question -- HDFS doesn't take any special
measures to enforce defragmentation, but HDFS does follow best
practices to avoid causing fragmentation.