On Mar 17, 2011, at 12:13 PM, Stuart Smith wrote:
> Parts of this may end up on the hbase list, but I thought I'd start here. My basic problem is:
> My cluster is getting full enough that having one data node go down does put a bit of pressure on the system (when balanced, every DN is more than half full).
Usually around the ~80% full mark is when HDFS starts getting a bit wonky on super active grids. Your best bet is to either delete some data/store the data more efficiently, add more nodes, or upgrade the storage capacity of the nodes you have. The balancer is only going to save you for so long until the whole thing tips over.
> Anybody here have any idea how badly running the balancer on a heavily active system messes things up? (for hdfs/hbase - if anyone knows).
I don't run HBase, but at Y! we used to run the balancer pretty much every day, even on super active grids. It 'mostly works' until you get to the point of no return, which it sounds like you are heading for...
> Any ideas? Or do I just need better hardware? Not sure if that's an option, though..
Depending upon how your systems are configured, something else to look at is how much space is getting ate by logs, mapreduce spill space, etc. A good daemon bounce might free up some stale handles as well.