Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - NN Memory Jumps every 1 1/2 hours


+
Edward Capriolo 2012-12-22, 04:24
+
Adam Faris 2012-12-22, 04:59
+
Edward Capriolo 2012-12-22, 12:54
+
Michael Segel 2012-12-22, 15:42
+
Joep Rottinghuis 2012-12-22, 17:17
+
Edward Capriolo 2012-12-22, 17:51
+
Suresh Srinivas 2012-12-22, 18:32
Copy link to this message
-
Re: NN Memory Jumps every 1 1/2 hours
Edward Capriolo 2012-12-23, 00:03
Blocks is ~26,000,000 Files is a bit higher ~27,000,000

Currently running:
[root@hnn217 ~]# java -version
java version "1.7.0_09"
Was running 1.6.0_23

export JVM_OPTIONS="-XX:+UseCompressedOops -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly"

I will grab the gc logs and the heap dump in a follow up.

On Sat, Dec 22, 2012 at 1:32 PM, Suresh Srinivas <[EMAIL PROTECTED]>wrote:

> Please take a histo live dump when the memory is full. Note that this
> causes full gc.
> http://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.html
>
> What are the number of blocks you have on the system.
>
> Send the JVM options you are using. From earlier java versions which used
> 1/8 of total heap for young gen, it has gone upto 1/3 of total heap. This
> could also be the reason.
>
> Do you collect gc logs? Send that as well.
>
> Sent from a mobile device
>
> On Dec 22, 2012, at 9:51 AM, Edward Capriolo <[EMAIL PROTECTED]>
> wrote:
>
> > Newer 1.6 are getting close to 1.7 so I am not going to fear a number and
> > fight the future.
> >
> > I have been aat around 27 million files for a while been as high as 30
> > million I do not think that is related.
> >
> > I do not think it is related to checkpoints but I am considering
> > raising/lowering the checkpoint triggers.
> >
> > On Saturday, December 22, 2012, Joep Rottinghuis <[EMAIL PROTECTED]
> >
> > wrote:
> >> Do your OOMs correlate with the secondary checkpointing?
> >>
> >> Joep
> >>
> >> Sent from my iPhone
> >>
> >> On Dec 22, 2012, at 7:42 AM, Michael Segel <[EMAIL PROTECTED]>
> > wrote:
> >>
> >>> Hey Silly question...
> >>>
> >>> How long have you had 27 million files?
> >>>
> >>> I mean can you correlate the number of files to the spat of OOMs?
> >>>
> >>> Even without problems... I'd say it would be a good idea to upgrade due
> > to the probability of a lot of code fixes...
> >>>
> >>> If you're running anything pre 1.x, going to 1.7 java wouldn't be a
> good
> > idea.  Having said that... outside of MapR, have any of the distros
> > certified themselves on 1.7 yet?
> >>>
> >>> On Dec 22, 2012, at 6:54 AM, Edward Capriolo <[EMAIL PROTECTED]>
> > wrote:
> >>>
> >>>> I will give this a go. I have actually went in JMX and manually
> > triggered
> >>>> GC no memory is returned. So I assumed something was leaking.
> >>>>
> >>>> On Fri, Dec 21, 2012 at 11:59 PM, Adam Faris <[EMAIL PROTECTED]>
> > wrote:
> >>>>
> >>>>> I know this will sound odd, but try reducing your heap size.   We had
> > an
> >>>>> issue like this where GC kept falling behind and we either ran out of
> > heap
> >>>>> or would be in full gc.  By reducing heap, we were forcing concurrent
> > mark
> >>>>> sweep to occur and avoided both full GC and running out of heap space
> > as
> >>>>> the JVM would collect objects more frequently.
> >>>>>
> >>>>> On Dec 21, 2012, at 8:24 PM, Edward Capriolo <[EMAIL PROTECTED]>
> >>>>> wrote:
> >>>>>
> >>>>>> I have an old hadoop 0.20.2 cluster. Have not had any issues for a
> > while.
> >>>>>> (which is why I never bothered an upgrade)
> >>>>>>
> >>>>>> Suddenly it OOMed last week. Now the OOMs happen periodically. We
> > have a
> >>>>>> fairly large NameNode heap Xmx 17GB. It is a fairly large FS about
> >>>>>> 27,000,000 files.
> >>>>>>
> >>>>>> So the strangest thing is that every 1 and 1/2 hour the NN memory
> > usage
> >>>>>> increases until the heap is full.
> >>>>>>
> >>>>>> http://imagebin.org/240287
> >>>>>>
> >>>>>> We tried failing over the NN to another machine. We change the Java
> >>>>> version
> >>>>>> from 1.6_23 -> 1.7.0.
> >>>>>>
> >>>>>> I have set the NameNode logs to debug and ALL and I have done the
> same
> >>>>> with
> >>>>>> the data nodes.
> >>>>>> Secondary NN is running and shipping edits and making new images.
> >>>>>>
> >>>>>> I am thinking something has corrupted the NN MetaData and after
+
Edward Capriolo 2012-12-23, 01:59
+
Suresh Srinivas 2012-12-23, 03:23
+
Edward Capriolo 2012-12-23, 18:34
+
Joep Rottinghuis 2012-12-23, 19:00
+
Suresh Srinivas 2012-12-24, 02:40
+
Edward Capriolo 2012-12-27, 21:48
+
Suresh Srinivas 2012-12-27, 22:08
+
Edward Capriolo 2012-12-27, 22:22
+
Suresh Srinivas 2012-12-27, 22:41
+
Edward Capriolo 2012-12-27, 22:58
+
Suresh Srinivas 2012-12-27, 23:12