Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> memory usage & process distribution


Copy link to this message
-
Re: memory usage & process distribution
Thanks John that did the trick.

On Mon, Jul 23, 2012 at 2:32 PM, John Vines <[EMAIL PROTECTED]> wrote:

> I was just referring to
>
> mapred.map.tasks
> mapred.reduce.tasks
> mapred.child.java.opts
>
> Which set the number of max map slots and reduce slots per node, and then
> how much memory they can use.
>
> John
>
> On Mon, Jul 23, 2012 at 1:20 PM, Miguel Pereira
> <[EMAIL PROTECTED]>wrote:
>
> > John,
> >
> > For configuring map reduce do you mean adding the
> >
> > mapred.local.dir
> > mapred.system.dir
> > mapred.temp.dir
> >
> > properties to the mapred-site.xml ?
> >
> >
> >
> > On Mon, Jul 23, 2012 at 11:33 AM, John Vines <[EMAIL PROTECTED]>
> > wrote:
> >
> > > On Mon, Jul 23, 2012 at 11:21 AM, Miguel Pereira
> > > <[EMAIL PROTECTED]>wrote:
> > >
> > > > Hey guys,
> > > >
> > > > I want to set up a realistic production cluster on Amazon's EC2 and I
> > am
> > > > trying to decide 2 things.
> > > >
> > > >
> > > >    -  Memory usage
> > > >
> > > > If I use one of the example configuration files, say the 512MB does
> > that
> > > > mean that all Accumulo processes will use up a total of 512MB? At
> least
> > > > this appears to be the case when looking at the accumulo-env.sh
> > > > This will determine weather I use a small or large instance.
> > > >
> > > >
> > > >
> > > Yes, it sets it up so all of the Accumulo processes have a footprint no
> > > bigger than 512MB. Mind you, we only have one configuration that is set
> > up
> > > for things in a distributed fashion, which is 3GB. So if you're running
> > > multiple nodes, you can up some of the configurations for a larger
> > > footprint because you won't be running every process on every node.
> > >
> > >
> > > >    - Process Distribution
> > > >
> > > > Is this a standard configuration? I will start off with a small # of
> > > worker
> > > > nodes ( 3-4 ) & hope to use my local machine as a "monitor" for the
> > > > accumulo & ganglia web UI's in order to avoid ssh -X latency.
> > > >
> > > > [ Name Node ] Name Node, Gmond
> > > > [ Secondary NN ] Secondary Name Node, Gmond
> > > > [ Job Tracker ] JobTracker, Gmond
> > > > [ Zookeeper ] Zookeeper
> > > > [ Accumulo Master ] Master, Tracer, Garbage Collector, Gmond,
> Jmxtrans
> > > > [ Monitor ] Monitor, Gmetad, Gweb
> > > > [ Worker Node ] DataNode, Tasktracker, TabletServer, Logger, Gmond,
> > > > Jmxtrans
> > > >
> > > > That looks good to me. Just make sure you configure your map reduce
> to
> > > that child memory * (reduce slots + map slots) aren't enough to cause
> > > swapping.
> > >
> > > >
> > > > Thanks,
> > > >
> > > > Miguel
> > > >
> > >
> > > John
> > >
> >
>