Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - Small cluster Hadoop/Accumulo process placement recommendation


Copy link to this message
-
Re: Small cluster Hadoop/Accumulo process placement recommendation
James Hughes 2013-04-17, 03:02
Hi Terry,

>From my limited experience, I'd say you have enough to get started.  I've
set up a small cloud with just 6 nodes on AWS:  One
namenode/tasktracker/Cloudbase (Accumulo when it was first released)
machine, one zookeeper, and 4 datanode/jobtracker/tabletserver nodes.
(Yes, I believe you should be able to run the Accumulo Master on the Hadoop
namenode.)

The cloud was set up to test out running things on AWS, so I didn't do
anything terribly data intensive on it.  The worst issue I had was that
MapReduce jobs needed more than a gig of memory, so early on I had to
switch from medium size machines (with 4 gigs of ram) to large instances (8
gigs of ram).

Thoughts:  You should have enough to get started.  If you don't know where
your limits are, you'll find them and then you can work to address them.
Recommendations:  If and when you're ready to optimize your project,
consider how your data is stored in Accumulo.  NoSQL is new enough that I
don't think the community has all the answers for particular use cases.

Cheers!

James
On Tue, Apr 16, 2013 at 8:07 PM, Terry P. <[EMAIL PROTECTED]> wrote:

> Greetings everyone,
> I'm learning a lot from reading all of the great questions and informative
> answers here on the Accumulo mailing list.  Thus far I haven't come across
> a question similar to mine, nor a basic recommendation so here goes:
>
> I'm looking for recommendations on process / component placement for a
> small Accumulo cluster serving a prototype.  It will be scaled later, but
> for now I'm looking at a cluster with just 8 nodes.  My current thought
> process has led me to the following server / process placement and I'm
> interested in feedback on it.
>
> zoo1, zoo2, zoo3: ZooKeeper servers, dual proc, 4 GB RAM (small servers)
>
> namenode, secnamenode: 16GB RAM, 4 cores each, with local and remote
> locations to store name data
> *** Can I place the Accumulo Master on the NameNode or Secondary NameNode?
> ***
>
> accdata1, accdata2, accdata3: 16GB RAM, 4 cores each, serving as HDFS
> DataNodes and Accumulo TabletServers each with 4 2TB JBOD disks for HDFS
>
> I'm thinking having the Accumulo Master on the NameNode will simplify
> cluster startup.  Thoughts?  Recommendations?
>
> Many thanks in advance,
> Terry
>