Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> memory usage & process distribution


Copy link to this message
-
Re: memory usage & process distribution
Thanks John that did the trick.

On Mon, Jul 23, 2012 at 2:32 PM, John Vines <[EMAIL PROTECTED]> wrote:

> I was just referring to
>
> mapred.map.tasks
> mapred.reduce.tasks
> mapred.child.java.opts
>
> Which set the number of max map slots and reduce slots per node, and then
> how much memory they can use.
>
> John
>
> On Mon, Jul 23, 2012 at 1:20 PM, Miguel Pereira
> <[EMAIL PROTECTED]>wrote:
>
> > John,
> >
> > For configuring map reduce do you mean adding the
> >
> > mapred.local.dir
> > mapred.system.dir
> > mapred.temp.dir
> >
> > properties to the mapred-site.xml ?
> >
> >
> >
> > On Mon, Jul 23, 2012 at 11:33 AM, John Vines <[EMAIL PROTECTED]>
> > wrote:
> >
> > > On Mon, Jul 23, 2012 at 11:21 AM, Miguel Pereira
> > > <[EMAIL PROTECTED]>wrote:
> > >
> > > > Hey guys,
> > > >
> > > > I want to set up a realistic production cluster on Amazon's EC2 and I
> > am
> > > > trying to decide 2 things.
> > > >
> > > >
> > > >    -  Memory usage
> > > >
> > > > If I use one of the example configuration files, say the 512MB does
> > that
> > > > mean that all Accumulo processes will use up a total of 512MB? At
> least
> > > > this appears to be the case when looking at the accumulo-env.sh
> > > > This will determine weather I use a small or large instance.
> > > >
> > > >
> > > >
> > > Yes, it sets it up so all of the Accumulo processes have a footprint no
> > > bigger than 512MB. Mind you, we only have one configuration that is set
> > up
> > > for things in a distributed fashion, which is 3GB. So if you're running
> > > multiple nodes, you can up some of the configurations for a larger
> > > footprint because you won't be running every process on every node.
> > >
> > >
> > > >    - Process Distribution
> > > >
> > > > Is this a standard configuration? I will start off with a small # of
> > > worker
> > > > nodes ( 3-4 ) & hope to use my local machine as a "monitor" for the
> > > > accumulo & ganglia web UI's in order to avoid ssh -X latency.
> > > >
> > > > [ Name Node ] Name Node, Gmond
> > > > [ Secondary NN ] Secondary Name Node, Gmond
> > > > [ Job Tracker ] JobTracker, Gmond
> > > > [ Zookeeper ] Zookeeper
> > > > [ Accumulo Master ] Master, Tracer, Garbage Collector, Gmond,
> Jmxtrans
> > > > [ Monitor ] Monitor, Gmetad, Gweb
> > > > [ Worker Node ] DataNode, Tasktracker, TabletServer, Logger, Gmond,
> > > > Jmxtrans
> > > >
> > > > That looks good to me. Just make sure you configure your map reduce
> to
> > > that child memory * (reduce slots + map slots) aren't enough to cause
> > > swapping.
> > >
> > > >
> > > > Thanks,
> > > >
> > > > Miguel
> > > >
> > >
> > > John
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB