When you say … "It is not recommended to *ever run the HDFS balancer on a cluster running HBase. “ … thats a very scary statement.
Not really a good idea. Unless you are building a cluster for a specific use case.
When you look at the larger picture… in most use cases, the cluster will contain more data in flat files (HDFS) than they would inside HBase.
(which you allude to in you last paragraph) so balancing is a good idea. (Even manual processes can be run in cron jobs ;-)
And no, you do not use a data node as an edge node.
(Really saying that? C’mon, really? ) Never a good design. Ever.
I agree that you should run major compactions after running the load balancer. (HDFS)
But the point I am trying to make is that with respect to HBase, you still need to think about the cluster as a whole.
The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
Use at your own risk.
michael_segel (AT) hotmail.com