-Re: Small cluster Hadoop/Accumulo process placement recommendation
James Hughes 2013-04-17, 03:02
>From my limited experience, I'd say you have enough to get started. I've
set up a small cloud with just 6 nodes on AWS: One
namenode/tasktracker/Cloudbase (Accumulo when it was first released)
machine, one zookeeper, and 4 datanode/jobtracker/tabletserver nodes.
(Yes, I believe you should be able to run the Accumulo Master on the Hadoop
The cloud was set up to test out running things on AWS, so I didn't do
anything terribly data intensive on it. The worst issue I had was that
MapReduce jobs needed more than a gig of memory, so early on I had to
switch from medium size machines (with 4 gigs of ram) to large instances (8
gigs of ram).
Thoughts: You should have enough to get started. If you don't know where
your limits are, you'll find them and then you can work to address them.
Recommendations: If and when you're ready to optimize your project,
consider how your data is stored in Accumulo. NoSQL is new enough that I
don't think the community has all the answers for particular use cases.
On Tue, Apr 16, 2013 at 8:07 PM, Terry P. <[EMAIL PROTECTED]> wrote:
> Greetings everyone,
> I'm learning a lot from reading all of the great questions and informative
> answers here on the Accumulo mailing list. Thus far I haven't come across
> a question similar to mine, nor a basic recommendation so here goes:
> I'm looking for recommendations on process / component placement for a
> small Accumulo cluster serving a prototype. It will be scaled later, but
> for now I'm looking at a cluster with just 8 nodes. My current thought
> process has led me to the following server / process placement and I'm
> interested in feedback on it.
> zoo1, zoo2, zoo3: ZooKeeper servers, dual proc, 4 GB RAM (small servers)
> namenode, secnamenode: 16GB RAM, 4 cores each, with local and remote
> locations to store name data
> *** Can I place the Accumulo Master on the NameNode or Secondary NameNode?
> accdata1, accdata2, accdata3: 16GB RAM, 4 cores each, serving as HDFS
> DataNodes and Accumulo TabletServers each with 4 2TB JBOD disks for HDFS
> I'm thinking having the Accumulo Master on the NameNode will simplify
> cluster startup. Thoughts? Recommendations?
> Many thanks in advance,