|
|
-
How is hadoop going to handle the next generation disks?
Edward Capriolo 2011-04-08, 04:15
I have a 0.20.2 cluster. I notice that our nodes with 2 TB disks waste tons of disk io doing a 'du -sk' of each data directory. Instead of 'du -sk' why not just do this with java.io.file? How is this going to work with 4TB 8TB disks and up ? It seems like calculating used and free disk space could be done a better way.
Edward
-
Re: How is hadoop going to handle the next generation disks?
sridhar basam 2011-04-08, 15:37
How many files do you have per node? What i find is that most of my inodes/dentries are almost always cached so calculating the 'du -sk' on a host even with hundreds of thousands of files the du -sk generally uses high i/o for a couple of seconds. I am using 2TB disks too.
Sridhar
On Fri, Apr 8, 2011 at 12:15 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote:
> I have a 0.20.2 cluster. I notice that our nodes with 2 TB disks waste > tons of disk io doing a 'du -sk' of each data directory. Instead of > 'du -sk' why not just do this with java.io.file? How is this going to > work with 4TB 8TB disks and up ? It seems like calculating used and > free disk space could be done a better way. > > Edward >
-
Re: How is hadoop going to handle the next generation disks?
sridhar basam 2011-04-08, 16:24
BTW this is on systems which have a lot of RAM and aren't under high load.
If you find that your system is evicting dentries/inodes from its cache, you might want to experiment with drop vm.vfs_cache_pressure from its default so that the they are preferred over the pagecache. At the extreme, setting it to 0 means they are never evicted.
Sridhar
On Fri, Apr 8, 2011 at 11:37 AM, sridhar basam <[EMAIL PROTECTED]> wrote:
> > How many files do you have per node? What i find is that most of my > inodes/dentries are almost always cached so calculating the 'du -sk' on a > host even with hundreds of thousands of files the du -sk generally uses high > i/o for a couple of seconds. I am using 2TB disks too. > > Sridhar > > > > On Fri, Apr 8, 2011 at 12:15 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote: > >> I have a 0.20.2 cluster. I notice that our nodes with 2 TB disks waste >> tons of disk io doing a 'du -sk' of each data directory. Instead of >> 'du -sk' why not just do this with java.io.file? How is this going to >> work with 4TB 8TB disks and up ? It seems like calculating used and >> free disk space could be done a better way. >> >> Edward >> > >
-
Re: How is hadoop going to handle the next generation disks?
Edward Capriolo 2011-04-08, 17:59
On Fri, Apr 8, 2011 at 12:24 PM, sridhar basam <[EMAIL PROTECTED]> wrote: > > BTW this is on systems which have a lot of RAM and aren't under high load. > If you find that your system is evicting dentries/inodes from its cache, you > might want to experiment with drop vm.vfs_cache_pressure from its default so > that the they are preferred over the pagecache. At the extreme, setting it > to 0 means they are never evicted. > Sridhar > > On Fri, Apr 8, 2011 at 11:37 AM, sridhar basam <[EMAIL PROTECTED]> wrote: >> >> How many files do you have per node? What i find is that most of my >> inodes/dentries are almost always cached so calculating the 'du -sk' on a >> host even with hundreds of thousands of files the du -sk generally uses high >> i/o for a couple of seconds. I am using 2TB disks too. >> Sridhar >> >> >> On Fri, Apr 8, 2011 at 12:15 AM, Edward Capriolo <[EMAIL PROTECTED]> >> wrote: >>> >>> I have a 0.20.2 cluster. I notice that our nodes with 2 TB disks waste >>> tons of disk io doing a 'du -sk' of each data directory. Instead of >>> 'du -sk' why not just do this with java.io.file? How is this going to >>> work with 4TB 8TB disks and up ? It seems like calculating used and >>> free disk space could be done a better way. >>> >>> Edward >> > >
Right. Most inodes are always cached when:
1) small disks 2) light load.
But that is not the case with hadoop.
Making the problem worse: It seems like hadoop seems to issues 'du -sk' for all disks at the same time. This pulverises cache.
All this to calculate a size that is typically within .01% of what a df estimate would tell us.
-
Re: How is hadoop going to handle the next generation disks?
sridhar basam 2011-04-08, 18:51
On Fri, Apr 8, 2011 at 1:59 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote:
> > Right. Most inodes are always cached when: > > 1) small disks > 2) light load. >
But that is not the case with hadoop. > > Making the problem worse: > It seems like hadoop seems to issues 'du -sk' for all disks at the > same time. This pulverises cache. > > All this to calculate a size that is typically within .01% of what a > df estimate would tell us. >
Don't know your setup but i think this is manageble in the short-medium term. Even with a 20TB node, you are likely looking at much less than a million files depending on your configuration and usage. I would much rather blow 500MB-1GB on keeping these entries in RAM vs the pagecache where most it probably ends up hitting the disks anyway.
The one case where i think the du is needed is for when people haven't dedicated the entire space on a drive to hadoop. Using df in this case wouldn't accurately reflect usage.
Sridhar
|
|