I want to set up a realistic production cluster on Amazon's EC2 and I am
trying to decide 2 things.
- Memory usage
If I use one of the example configuration files, say the 512MB does that
mean that all Accumulo processes will use up a total of 512MB? At least
this appears to be the case when looking at the accumulo-env.sh
This will determine weather I use a small or large instance.
- Process Distribution
Is this a standard configuration? I will start off with a small # of worker
nodes ( 3-4 ) & hope to use my local machine as a "monitor" for the
accumulo & ganglia web UI's in order to avoid ssh -X latency.
[ Name Node ] Name Node, Gmond
[ Secondary NN ] Secondary Name Node, Gmond
[ Job Tracker ] JobTracker, Gmond
[ Zookeeper ] Zookeeper
[ Accumulo Master ] Master, Tracer, Garbage Collector, Gmond, Jmxtrans
[ Monitor ] Monitor, Gmetad, Gweb
[ Worker Node ] DataNode, Tasktracker, TabletServer, Logger, Gmond, Jmxtrans