Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Small cluster Hadoop/Accumulo process placement recommendation


Copy link to this message
-
Re: Small cluster Hadoop/Accumulo process placement recommendation
Hi Terry,

>From my limited experience, I'd say you have enough to get started.  I've
set up a small cloud with just 6 nodes on AWS:  One
namenode/tasktracker/Cloudbase (Accumulo when it was first released)
machine, one zookeeper, and 4 datanode/jobtracker/tabletserver nodes.
(Yes, I believe you should be able to run the Accumulo Master on the Hadoop
namenode.)

The cloud was set up to test out running things on AWS, so I didn't do
anything terribly data intensive on it.  The worst issue I had was that
MapReduce jobs needed more than a gig of memory, so early on I had to
switch from medium size machines (with 4 gigs of ram) to large instances (8
gigs of ram).

Thoughts:  You should have enough to get started.  If you don't know where
your limits are, you'll find them and then you can work to address them.
Recommendations:  If and when you're ready to optimize your project,
consider how your data is stored in Accumulo.  NoSQL is new enough that I
don't think the community has all the answers for particular use cases.

Cheers!

James
On Tue, Apr 16, 2013 at 8:07 PM, Terry P. <[EMAIL PROTECTED]> wrote:

> Greetings everyone,
> I'm learning a lot from reading all of the great questions and informative
> answers here on the Accumulo mailing list.  Thus far I haven't come across
> a question similar to mine, nor a basic recommendation so here goes:
>
> I'm looking for recommendations on process / component placement for a
> small Accumulo cluster serving a prototype.  It will be scaled later, but
> for now I'm looking at a cluster with just 8 nodes.  My current thought
> process has led me to the following server / process placement and I'm
> interested in feedback on it.
>
> zoo1, zoo2, zoo3: ZooKeeper servers, dual proc, 4 GB RAM (small servers)
>
> namenode, secnamenode: 16GB RAM, 4 cores each, with local and remote
> locations to store name data
> *** Can I place the Accumulo Master on the NameNode or Secondary NameNode?
> ***
>
> accdata1, accdata2, accdata3: 16GB RAM, 4 cores each, serving as HDFS
> DataNodes and Accumulo TabletServers each with 4 2TB JBOD disks for HDFS
>
> I'm thinking having the Accumulo Master on the NameNode will simplify
> cluster startup.  Thoughts?  Recommendations?
>
> Many thanks in advance,
> Terry
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB