I was wondering what a minimal setup in terms of # of servers might be for HBase. Here is what I think is needed:
1 or 2 HBase master servers -- 1 or 2 dedicated boxes?
1 or more RegionServers -- 1 or more dedicated boxes?
1 or more Zookeepers -- 1 or more dedicated boxes?
If running on HDFS, add:
1 or 2 NameNodes -- can this run on same box(es) as HBase master?
1 or more DataNodes -- can DNs be on same box(as) as RegionServers?
If you want to run MR jobs on data in HBase, add:
1 or more JobTrackers -- can this run on the same box as HBase master and NN?
1 or more TaskTrackers -- can this run on the same box as RegionServer + DN?
So, my main questions are:
* Is it OK for HBase Master and NameNode (+JobTracker) to run on the same server? NN needs memory. What does HBase Master need the most?
* Is it OK for RegionServer and DataNode (+TaskTracker) to run on the same server? (I think this is actually advised, so data is local?) I believe RegionMaster is a memory hungry (b/c of Memcache) process? I believe DNs need the CPU to run the MR jobs, and disk I/O, of course.
* Finally, is the following correct?
Non-HA system, with local disk:
1 HB master/NN/JT + 1 RegionServer/TT/DN + 1 ZK = 3 boxes
HA HBase cluster with HDFS:
2 HB masters/NNs/JTs + 2 RegionServers/TTs/DNs + 2 ZKs = 6 boxes