How is it that 36 processes are not expected if you have configured 48 + 12
= 50 slots available on the machine?
On Wed, May 11, 2011 at 11:11 AM, Adi <[EMAIL PROTECTED]> wrote:
> By our calculations hadoop should not exceed 70% of memory.
> Allocated per node - 48 map slots (24 GB) , 12 reduce slots (6 GB), 1 GB
> each for DataNode/TaskTracker and one JobTracker Totalling 33/34 GB
> The queues are capped at using only 90% of capacity allocated so generally
> 10% of slots are always kept free.
> The cluster was running total 33 mappers and 1 reducer so around 8-9
> per node with 3 GB max limit and they were utilizing around 2GB each.
> Top was showing 100% memory utilized. Which our sys admin says is ok as the
> memory is used for file caching by linux if the processes are not using it.
> No swapping on 3 nodes.
> Then node4 just started swapping after the number of processes shot up
> unexpectedly. The main mystery are these excess number of processes on the
> node which went down. 36 as opposed to expected 11. The other 3 nodes were
> successfully executing the mappers without any memory/swap issues.
> On Wed, May 11, 2011 at 1:40 PM, Michel Segel <[EMAIL PROTECTED]
> > You have to do the math...
> > If you have 2gb per mapper, and run 10 mappers per node... That means
> > of memory.
> > Then you have TT and DN running which also take memory...
> > What did you set as the number of mappers/reducers per node?
> > What do you see in ganglia or when you run top?
> > Sent from a remote device. Please excuse any typos...
> > Mike Segel
> > On May 11, 2011, at 12:31 PM, Adi <[EMAIL PROTECTED]> wrote:
> > > Hello Hadoop Gurus,
> > > We are running a 4-node cluster. We just upgraded the RAM to 48 GB. We
> > have
> > > allocated around 33-34 GB per node for hadoop processes. Leaving the
> > of
> > > the 14-15 GB memory for OS and as buffer. There are no other processes
> > > running on these nodes.
> > > Most of the lighter jobs run successfully but one big job is
> > de-stabilizing
> > > the cluster. One node starts swapping and runs out of swap space and
> > > offline. We tracked the processes on that node and noticed that it ends
> > up
> > > with more than expected hadoop-java processes.
> > > The other 3 nodes were running 10 or 11 processes and this node ends up
> > with
> > > 36. After killing the job we find these processes still show up and we
> > have
> > > to kill them manually.
> > > We have tried reducing the swappiness to 6 but saw the same results. It
> > also
> > > looks like hadoop stays well within the memory limits allocated and
> > > starts swapping.
> > >
> > > Some other suggestions we have seen are:
> > > 1) Increase swap size. Current size is 6 GB. The most quoted size is
> > 'tons
> > > of swap' but note sure how much it translates to in numbers. Should it
> > 16
> > > or 24 GB
> > > 2) Increase overcommit ratio. Not sure if this helps as a few blog
> > comments
> > > mentioned it didn't help
> > >
> > > Any other hadoop or linux config suggestions are welcome.
> > >
> > > Thanks.
> > >
> > > -Adi