On Jun 10, 2011, at 6:32 AM, [EMAIL PROTECTED] wrote:
> Dear all,
> I'm looking for ways to improve the namenode heap size usage of a 800-node 10PB testing Hadoop cluster that stores
> around 30 million files.
> Here's some info:
> 1 x namenode: 32GB RAM, 24GB heap size
> 800 x datanode: 8GB RAM, 13TB hdd
> *33050825 files and directories, 47708724 blocks = 80759549 total. Heap Size is 22.93 GB / 22.93 GB (100%) *
> From the cluster summary report, it seems the heap size usage is always full but couldn't drop, do you guys know of any ways
> to reduce it ? So far I don't see any namenode OOM errors so it looks memory assigned for the namenode process is (just)
> enough. But i'm curious which factors would account for the full use of heap size ?
The advice I give to folks is to plan on 1GB heap for every million objects. It's an over-estimate, but I prefer to be on the safe side. Why not increase the heap-size to 28GB? Should buy you some time.
You can turn on compressed pointers, but your best bet is really going to be spending some more money on RAM.