Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> How is hadoop going to handle the next generation disks?


Copy link to this message
-
Re: How is hadoop going to handle the next generation disks?
On Fri, Apr 8, 2011 at 12:24 PM, sridhar basam <[EMAIL PROTECTED]> wrote:
>
> BTW this is on systems which have a lot of RAM and aren't under high load.
> If you find that your system is evicting dentries/inodes from its cache, you
> might want to experiment with drop vm.vfs_cache_pressure from its default so
> that the they are preferred over the pagecache. At the extreme, setting it
> to 0 means they are never evicted.
>  Sridhar
>
> On Fri, Apr 8, 2011 at 11:37 AM, sridhar basam <[EMAIL PROTECTED]> wrote:
>>
>> How many files do you have per node? What i find is that most of my
>> inodes/dentries are almost always cached so calculating the 'du -sk' on a
>> host even with hundreds of thousands of files the du -sk generally uses high
>> i/o for a couple of seconds. I am using 2TB disks too.
>>  Sridhar
>>
>>
>> On Fri, Apr 8, 2011 at 12:15 AM, Edward Capriolo <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> I have a 0.20.2 cluster. I notice that our nodes with 2 TB disks waste
>>> tons of disk io doing a 'du -sk' of each data directory. Instead of
>>> 'du -sk' why not just do this with java.io.file? How is this going to
>>> work with 4TB 8TB disks and up ? It seems like calculating used and
>>> free disk space could be done a better way.
>>>
>>> Edward
>>
>
>

Right. Most inodes are always cached when:

1) small disks
2) light load.

But that is not the case with hadoop.

Making the problem worse:
It seems like hadoop seems to issues 'du -sk' for all disks at the
same time. This pulverises cache.

All this to calculate a size that is typically within .01% of what a
df estimate would tell us.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB