|
|
-
NN Memory Jumps every 1 1/2 hours
Edward Capriolo 2012-12-22, 04:24
I have an old hadoop 0.20.2 cluster. Have not had any issues for a while. (which is why I never bothered an upgrade) Suddenly it OOMed last week. Now the OOMs happen periodically. We have a fairly large NameNode heap Xmx 17GB. It is a fairly large FS about 27,000,000 files. So the strangest thing is that every 1 and 1/2 hour the NN memory usage increases until the heap is full. http://imagebin.org/240287We tried failing over the NN to another machine. We change the Java version from 1.6_23 -> 1.7.0. I have set the NameNode logs to debug and ALL and I have done the same with the data nodes. Secondary NN is running and shipping edits and making new images. I am thinking something has corrupted the NN MetaData and after enough time it becomes a time bomb, but this is just a total shot in the dark. Does anyone have any interesting trouble shooting ideas?
+
Edward Capriolo 2012-12-22, 04:24
-
Re: NN Memory Jumps every 1 1/2 hours
Adam Faris 2012-12-22, 04:59
I know this will sound odd, but try reducing your heap size. We had an issue like this where GC kept falling behind and we either ran out of heap or would be in full gc. By reducing heap, we were forcing concurrent mark sweep to occur and avoided both full GC and running out of heap space as the JVM would collect objects more frequently. On Dec 21, 2012, at 8:24 PM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > I have an old hadoop 0.20.2 cluster. Have not had any issues for a while. > (which is why I never bothered an upgrade) > > Suddenly it OOMed last week. Now the OOMs happen periodically. We have a > fairly large NameNode heap Xmx 17GB. It is a fairly large FS about > 27,000,000 files. > > So the strangest thing is that every 1 and 1/2 hour the NN memory usage > increases until the heap is full. > > http://imagebin.org/240287> > We tried failing over the NN to another machine. We change the Java version > from 1.6_23 -> 1.7.0. > > I have set the NameNode logs to debug and ALL and I have done the same with > the data nodes. > Secondary NN is running and shipping edits and making new images. > > I am thinking something has corrupted the NN MetaData and after enough time > it becomes a time bomb, but this is just a total shot in the dark. Does > anyone have any interesting trouble shooting ideas?
+
Adam Faris 2012-12-22, 04:59
-
Re: NN Memory Jumps every 1 1/2 hours
Edward Capriolo 2012-12-22, 12:54
I will give this a go. I have actually went in JMX and manually triggered GC no memory is returned. So I assumed something was leaking. On Fri, Dec 21, 2012 at 11:59 PM, Adam Faris <[EMAIL PROTECTED]> wrote: > I know this will sound odd, but try reducing your heap size. We had an > issue like this where GC kept falling behind and we either ran out of heap > or would be in full gc. By reducing heap, we were forcing concurrent mark > sweep to occur and avoided both full GC and running out of heap space as > the JVM would collect objects more frequently. > > On Dec 21, 2012, at 8:24 PM, Edward Capriolo <[EMAIL PROTECTED]> > wrote: > > > I have an old hadoop 0.20.2 cluster. Have not had any issues for a while. > > (which is why I never bothered an upgrade) > > > > Suddenly it OOMed last week. Now the OOMs happen periodically. We have a > > fairly large NameNode heap Xmx 17GB. It is a fairly large FS about > > 27,000,000 files. > > > > So the strangest thing is that every 1 and 1/2 hour the NN memory usage > > increases until the heap is full. > > > > http://imagebin.org/240287> > > > We tried failing over the NN to another machine. We change the Java > version > > from 1.6_23 -> 1.7.0. > > > > I have set the NameNode logs to debug and ALL and I have done the same > with > > the data nodes. > > Secondary NN is running and shipping edits and making new images. > > > > I am thinking something has corrupted the NN MetaData and after enough > time > > it becomes a time bomb, but this is just a total shot in the dark. Does > > anyone have any interesting trouble shooting ideas? > >
+
Edward Capriolo 2012-12-22, 12:54
-
Re: NN Memory Jumps every 1 1/2 hours
Michael Segel 2012-12-22, 15:42
Hey Silly question... How long have you had 27 million files? I mean can you correlate the number of files to the spat of OOMs? Even without problems... I'd say it would be a good idea to upgrade due to the probability of a lot of code fixes... If you're running anything pre 1.x, going to 1.7 java wouldn't be a good idea. Having said that... outside of MapR, have any of the distros certified themselves on 1.7 yet? On Dec 22, 2012, at 6:54 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > I will give this a go. I have actually went in JMX and manually triggered > GC no memory is returned. So I assumed something was leaking. > > On Fri, Dec 21, 2012 at 11:59 PM, Adam Faris <[EMAIL PROTECTED]> wrote: > >> I know this will sound odd, but try reducing your heap size. We had an >> issue like this where GC kept falling behind and we either ran out of heap >> or would be in full gc. By reducing heap, we were forcing concurrent mark >> sweep to occur and avoided both full GC and running out of heap space as >> the JVM would collect objects more frequently. >> >> On Dec 21, 2012, at 8:24 PM, Edward Capriolo <[EMAIL PROTECTED]> >> wrote: >> >>> I have an old hadoop 0.20.2 cluster. Have not had any issues for a while. >>> (which is why I never bothered an upgrade) >>> >>> Suddenly it OOMed last week. Now the OOMs happen periodically. We have a >>> fairly large NameNode heap Xmx 17GB. It is a fairly large FS about >>> 27,000,000 files. >>> >>> So the strangest thing is that every 1 and 1/2 hour the NN memory usage >>> increases until the heap is full. >>> >>> http://imagebin.org/240287>>> >>> We tried failing over the NN to another machine. We change the Java >> version >>> from 1.6_23 -> 1.7.0. >>> >>> I have set the NameNode logs to debug and ALL and I have done the same >> with >>> the data nodes. >>> Secondary NN is running and shipping edits and making new images. >>> >>> I am thinking something has corrupted the NN MetaData and after enough >> time >>> it becomes a time bomb, but this is just a total shot in the dark. Does >>> anyone have any interesting trouble shooting ideas? >> >>
+
Michael Segel 2012-12-22, 15:42
-
Re: NN Memory Jumps every 1 1/2 hours
Joep Rottinghuis 2012-12-22, 17:17
Do your OOMs correlate with the secondary checkpointing? Joep Sent from my iPhone On Dec 22, 2012, at 7:42 AM, Michael Segel <[EMAIL PROTECTED]> wrote: > Hey Silly question... > > How long have you had 27 million files? > > I mean can you correlate the number of files to the spat of OOMs? > > Even without problems... I'd say it would be a good idea to upgrade due to the probability of a lot of code fixes... > > If you're running anything pre 1.x, going to 1.7 java wouldn't be a good idea. Having said that... outside of MapR, have any of the distros certified themselves on 1.7 yet? > > On Dec 22, 2012, at 6:54 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > >> I will give this a go. I have actually went in JMX and manually triggered >> GC no memory is returned. So I assumed something was leaking. >> >> On Fri, Dec 21, 2012 at 11:59 PM, Adam Faris <[EMAIL PROTECTED]> wrote: >> >>> I know this will sound odd, but try reducing your heap size. We had an >>> issue like this where GC kept falling behind and we either ran out of heap >>> or would be in full gc. By reducing heap, we were forcing concurrent mark >>> sweep to occur and avoided both full GC and running out of heap space as >>> the JVM would collect objects more frequently. >>> >>> On Dec 21, 2012, at 8:24 PM, Edward Capriolo <[EMAIL PROTECTED]> >>> wrote: >>> >>>> I have an old hadoop 0.20.2 cluster. Have not had any issues for a while. >>>> (which is why I never bothered an upgrade) >>>> >>>> Suddenly it OOMed last week. Now the OOMs happen periodically. We have a >>>> fairly large NameNode heap Xmx 17GB. It is a fairly large FS about >>>> 27,000,000 files. >>>> >>>> So the strangest thing is that every 1 and 1/2 hour the NN memory usage >>>> increases until the heap is full. >>>> >>>> http://imagebin.org/240287>>>> >>>> We tried failing over the NN to another machine. We change the Java >>> version >>>> from 1.6_23 -> 1.7.0. >>>> >>>> I have set the NameNode logs to debug and ALL and I have done the same >>> with >>>> the data nodes. >>>> Secondary NN is running and shipping edits and making new images. >>>> >>>> I am thinking something has corrupted the NN MetaData and after enough >>> time >>>> it becomes a time bomb, but this is just a total shot in the dark. Does >>>> anyone have any interesting trouble shooting ideas? >
+
Joep Rottinghuis 2012-12-22, 17:17
-
Re: NN Memory Jumps every 1 1/2 hours
Edward Capriolo 2012-12-22, 17:51
Newer 1.6 are getting close to 1.7 so I am not going to fear a number and fight the future. I have been aat around 27 million files for a while been as high as 30 million I do not think that is related. I do not think it is related to checkpoints but I am considering raising/lowering the checkpoint triggers. On Saturday, December 22, 2012, Joep Rottinghuis <[EMAIL PROTECTED]> wrote: > Do your OOMs correlate with the secondary checkpointing? > > Joep > > Sent from my iPhone > > On Dec 22, 2012, at 7:42 AM, Michael Segel <[EMAIL PROTECTED]> wrote: > >> Hey Silly question... >> >> How long have you had 27 million files? >> >> I mean can you correlate the number of files to the spat of OOMs? >> >> Even without problems... I'd say it would be a good idea to upgrade due to the probability of a lot of code fixes... >> >> If you're running anything pre 1.x, going to 1.7 java wouldn't be a good idea. Having said that... outside of MapR, have any of the distros certified themselves on 1.7 yet? >> >> On Dec 22, 2012, at 6:54 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: >> >>> I will give this a go. I have actually went in JMX and manually triggered >>> GC no memory is returned. So I assumed something was leaking. >>> >>> On Fri, Dec 21, 2012 at 11:59 PM, Adam Faris <[EMAIL PROTECTED]> wrote: >>> >>>> I know this will sound odd, but try reducing your heap size. We had an >>>> issue like this where GC kept falling behind and we either ran out of heap >>>> or would be in full gc. By reducing heap, we were forcing concurrent mark >>>> sweep to occur and avoided both full GC and running out of heap space as >>>> the JVM would collect objects more frequently. >>>> >>>> On Dec 21, 2012, at 8:24 PM, Edward Capriolo <[EMAIL PROTECTED]> >>>> wrote: >>>> >>>>> I have an old hadoop 0.20.2 cluster. Have not had any issues for a while. >>>>> (which is why I never bothered an upgrade) >>>>> >>>>> Suddenly it OOMed last week. Now the OOMs happen periodically. We have a >>>>> fairly large NameNode heap Xmx 17GB. It is a fairly large FS about >>>>> 27,000,000 files. >>>>> >>>>> So the strangest thing is that every 1 and 1/2 hour the NN memory usage >>>>> increases until the heap is full. >>>>> >>>>> http://imagebin.org/240287>>>>> >>>>> We tried failing over the NN to another machine. We change the Java >>>> version >>>>> from 1.6_23 -> 1.7.0. >>>>> >>>>> I have set the NameNode logs to debug and ALL and I have done the same >>>> with >>>>> the data nodes. >>>>> Secondary NN is running and shipping edits and making new images. >>>>> >>>>> I am thinking something has corrupted the NN MetaData and after enough >>>> time >>>>> it becomes a time bomb, but this is just a total shot in the dark. Does >>>>> anyone have any interesting trouble shooting ideas? >> >
+
Edward Capriolo 2012-12-22, 17:51
-
Re: NN Memory Jumps every 1 1/2 hours
Suresh Srinivas 2012-12-22, 18:32
Please take a histo live dump when the memory is full. Note that this causes full gc. http://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.htmlWhat are the number of blocks you have on the system. Send the JVM options you are using. From earlier java versions which used 1/8 of total heap for young gen, it has gone upto 1/3 of total heap. This could also be the reason. Do you collect gc logs? Send that as well. Sent from a mobile device On Dec 22, 2012, at 9:51 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > Newer 1.6 are getting close to 1.7 so I am not going to fear a number and > fight the future. > > I have been aat around 27 million files for a while been as high as 30 > million I do not think that is related. > > I do not think it is related to checkpoints but I am considering > raising/lowering the checkpoint triggers. > > On Saturday, December 22, 2012, Joep Rottinghuis <[EMAIL PROTECTED]> > wrote: >> Do your OOMs correlate with the secondary checkpointing? >> >> Joep >> >> Sent from my iPhone >> >> On Dec 22, 2012, at 7:42 AM, Michael Segel <[EMAIL PROTECTED]> > wrote: >> >>> Hey Silly question... >>> >>> How long have you had 27 million files? >>> >>> I mean can you correlate the number of files to the spat of OOMs? >>> >>> Even without problems... I'd say it would be a good idea to upgrade due > to the probability of a lot of code fixes... >>> >>> If you're running anything pre 1.x, going to 1.7 java wouldn't be a good > idea. Having said that... outside of MapR, have any of the distros > certified themselves on 1.7 yet? >>> >>> On Dec 22, 2012, at 6:54 AM, Edward Capriolo <[EMAIL PROTECTED]> > wrote: >>> >>>> I will give this a go. I have actually went in JMX and manually > triggered >>>> GC no memory is returned. So I assumed something was leaking. >>>> >>>> On Fri, Dec 21, 2012 at 11:59 PM, Adam Faris <[EMAIL PROTECTED]> > wrote: >>>> >>>>> I know this will sound odd, but try reducing your heap size. We had > an >>>>> issue like this where GC kept falling behind and we either ran out of > heap >>>>> or would be in full gc. By reducing heap, we were forcing concurrent > mark >>>>> sweep to occur and avoided both full GC and running out of heap space > as >>>>> the JVM would collect objects more frequently. >>>>> >>>>> On Dec 21, 2012, at 8:24 PM, Edward Capriolo <[EMAIL PROTECTED]> >>>>> wrote: >>>>> >>>>>> I have an old hadoop 0.20.2 cluster. Have not had any issues for a > while. >>>>>> (which is why I never bothered an upgrade) >>>>>> >>>>>> Suddenly it OOMed last week. Now the OOMs happen periodically. We > have a >>>>>> fairly large NameNode heap Xmx 17GB. It is a fairly large FS about >>>>>> 27,000,000 files. >>>>>> >>>>>> So the strangest thing is that every 1 and 1/2 hour the NN memory > usage >>>>>> increases until the heap is full. >>>>>> >>>>>> http://imagebin.org/240287>>>>>> >>>>>> We tried failing over the NN to another machine. We change the Java >>>>> version >>>>>> from 1.6_23 -> 1.7.0. >>>>>> >>>>>> I have set the NameNode logs to debug and ALL and I have done the same >>>>> with >>>>>> the data nodes. >>>>>> Secondary NN is running and shipping edits and making new images. >>>>>> >>>>>> I am thinking something has corrupted the NN MetaData and after enough >>>>> time >>>>>> it becomes a time bomb, but this is just a total shot in the dark. > Does >>>>>> anyone have any interesting trouble shooting ideas? >>
+
Suresh Srinivas 2012-12-22, 18:32
-
Re: NN Memory Jumps every 1 1/2 hours
Edward Capriolo 2012-12-23, 00:03
Blocks is ~26,000,000 Files is a bit higher ~27,000,000 Currently running: [root@hnn217 ~]# java -version java version "1.7.0_09" Was running 1.6.0_23 export JVM_OPTIONS="-XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly" I will grab the gc logs and the heap dump in a follow up. On Sat, Dec 22, 2012 at 1:32 PM, Suresh Srinivas <[EMAIL PROTECTED]>wrote: > Please take a histo live dump when the memory is full. Note that this > causes full gc. > http://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.html> > What are the number of blocks you have on the system. > > Send the JVM options you are using. From earlier java versions which used > 1/8 of total heap for young gen, it has gone upto 1/3 of total heap. This > could also be the reason. > > Do you collect gc logs? Send that as well. > > Sent from a mobile device > > On Dec 22, 2012, at 9:51 AM, Edward Capriolo <[EMAIL PROTECTED]> > wrote: > > > Newer 1.6 are getting close to 1.7 so I am not going to fear a number and > > fight the future. > > > > I have been aat around 27 million files for a while been as high as 30 > > million I do not think that is related. > > > > I do not think it is related to checkpoints but I am considering > > raising/lowering the checkpoint triggers. > > > > On Saturday, December 22, 2012, Joep Rottinghuis <[EMAIL PROTECTED] > > > > wrote: > >> Do your OOMs correlate with the secondary checkpointing? > >> > >> Joep > >> > >> Sent from my iPhone > >> > >> On Dec 22, 2012, at 7:42 AM, Michael Segel <[EMAIL PROTECTED]> > > wrote: > >> > >>> Hey Silly question... > >>> > >>> How long have you had 27 million files? > >>> > >>> I mean can you correlate the number of files to the spat of OOMs? > >>> > >>> Even without problems... I'd say it would be a good idea to upgrade due > > to the probability of a lot of code fixes... > >>> > >>> If you're running anything pre 1.x, going to 1.7 java wouldn't be a > good > > idea. Having said that... outside of MapR, have any of the distros > > certified themselves on 1.7 yet? > >>> > >>> On Dec 22, 2012, at 6:54 AM, Edward Capriolo <[EMAIL PROTECTED]> > > wrote: > >>> > >>>> I will give this a go. I have actually went in JMX and manually > > triggered > >>>> GC no memory is returned. So I assumed something was leaking. > >>>> > >>>> On Fri, Dec 21, 2012 at 11:59 PM, Adam Faris <[EMAIL PROTECTED]> > > wrote: > >>>> > >>>>> I know this will sound odd, but try reducing your heap size. We had > > an > >>>>> issue like this where GC kept falling behind and we either ran out of > > heap > >>>>> or would be in full gc. By reducing heap, we were forcing concurrent > > mark > >>>>> sweep to occur and avoided both full GC and running out of heap space > > as > >>>>> the JVM would collect objects more frequently. > >>>>> > >>>>> On Dec 21, 2012, at 8:24 PM, Edward Capriolo <[EMAIL PROTECTED]> > >>>>> wrote: > >>>>> > >>>>>> I have an old hadoop 0.20.2 cluster. Have not had any issues for a > > while. > >>>>>> (which is why I never bothered an upgrade) > >>>>>> > >>>>>> Suddenly it OOMed last week. Now the OOMs happen periodically. We > > have a > >>>>>> fairly large NameNode heap Xmx 17GB. It is a fairly large FS about > >>>>>> 27,000,000 files. > >>>>>> > >>>>>> So the strangest thing is that every 1 and 1/2 hour the NN memory > > usage > >>>>>> increases until the heap is full. > >>>>>> > >>>>>> http://imagebin.org/240287> >>>>>> > >>>>>> We tried failing over the NN to another machine. We change the Java > >>>>> version > >>>>>> from 1.6_23 -> 1.7.0. > >>>>>> > >>>>>> I have set the NameNode logs to debug and ALL and I have done the > same > >>>>> with > >>>>>> the data nodes. > >>>>>> Secondary NN is running and shipping edits and making new images. > >>>>>> > >>>>>> I am thinking something has corrupted the NN MetaData and after
+
Edward Capriolo 2012-12-23, 00:03
-
Re: NN Memory Jumps every 1 1/2 hours
Edward Capriolo 2012-12-23, 01:59
Ok so here is the latest. http://imagebin.org/240392I took a jmap on startup and one an hour after. http://pastebin.com/xEkWid4fI think the biggest deal is [B which may not be very helpful num #instances #bytes class name ---------------------------------------------- 1: 25094067 2319943656 [B 2: 23720125 1518088000 org.apache.hadoop.hdfs.server.namenode.INodeFile 3: 24460244 1174091712 org.apache.hadoop.hdfs.server.namenode.BlocksMap$BlockInfo 4: 25671649 1134707328 [Ljava.lang.Object; 5: 31106937 995421984 java.util.HashMap$Entry 6: 23725233 570829968 [Lorg.apache.hadoop.hdfs.server.namenode.BlocksMap$BlockInfo; 7: 2934 322685952 [Ljava.util.HashMap$Entry; num #instances #bytes class name ---------------------------------------------- 1: 24739690 3727511000 [B 2: 23280668 1489962752 org.apache.hadoop.hdfs.server.namenode.INodeFile 3: 24850044 1192802112 org.apache.hadoop.hdfs.server.namenode.BlocksMap$BlockInfo 4: 26124258 1157073272 [Ljava.lang.Object; 5: 32142057 1028545824 java.util.HashMap$Entry 6: 23307473 560625432 [Lorg.apache.hadoop.hdfs.server.namenode.BlocksMap$BlockInfo; GC starts like this: 3.204: [GC 102656K->9625K(372032K), 0.0150300 secs] 3.519: [GC 112281K->21180K(372032K), 0.0741210 secs] 3.883: [GC 123836K->30729K(372032K), 0.0208900 secs] 4.194: [GC 132724K->45785K(372032K), 0.0293860 secs] 4.522: [GC 148441K->59282K(372032K), 0.0341330 secs] 4.844: [GC 161938K->70071K(372032K), 0.0284850 secs] 5.139: [GC 172727K->80624K(372032K), 0.0171910 secs] 5.338: [GC 183280K->90661K(372032K), 0.0184200 secs] 5.549: [GC 193317K->103126K(372032K), 0.0430080 secs] 5.775: [GC 205782K->113534K(372032K), 0.0359480 secs] 5.995: [GC 216190K->122832K(372032K), 0.0192900 secs] 6.192: [GC 225488K->131777K(372032K), 0.0183870 secs] Then steadily increases 453.808: [GC 7482139K->7384396K(11240624K), 0.0208170 secs] 455.605: [GC 7487052K->7384177K(11240624K), 0.0206360 secs] 457.942: [GC 7486831K->7384131K(11240624K), 0.0189600 secs] 459.924: [GC 7486787K->7384141K(11240624K), 0.0193560 secs] 462.887: [GC 7486797K->7384151K(11240624K), 0.0189290 secs] Until I triggered this full gc a moment ago. 6266.988: [GC 11255823K->10373641K(17194656K), 0.0331910 secs] 6280.083: [GC 11259721K->10373499K(17194656K), 0.0324870 secs] 6293.706: [GC 11259579K->10376656K(17194656K), 0.0324330 secs] 6309.781: [GC 11262736K->10376110K(17194656K), 0.0310330 secs] 6333.790: [GC 11262190K->10374348K(17194656K), 0.0297670 secs] 6333.934: [Full GC 10391746K->9722532K(17194656K), 63.9812940 secs] 6418.466: [GC 10608612K->9725743K(17201024K), 0.0339610 secs] 6421.420: [GC 10611823K->9760611K(17201024K), 0.1501610 secs] 6428.221: [GC 10646691K->9767236K(17201024K), 0.1503170 secs] 6437.431: [GC 10653316K->9734750K(17201024K), 0.0344960 secs] Essentially gc sometimes clears some memory but not all and then the line keeps rising. Delta is about 10-17 hours until the heap is exhaused. On Sat, Dec 22, 2012 at 7:03 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote: > Blocks is ~26,000,000 Files is a bit higher ~27,000,000 > > Currently running: > [root@hnn217 ~]# java -version > java version "1.7.0_09" > Was running 1.6.0_23 > > export JVM_OPTIONS="-XX:+UseCompressedOops -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly" > > I will grab the gc logs and the heap dump in a follow up. > > > > On Sat, Dec 22, 2012 at 1:32 PM, Suresh Srinivas <[EMAIL PROTECTED]>wrote: > >> Please take a histo live dump when the memory is full. Note that this >> causes full gc. >> http://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.html>> >> What are the number of blocks you have on the system. >> >> Send the JVM options you are using. From earlier java versions which used
+
Edward Capriolo 2012-12-23, 01:59
-
Re: NN Memory Jumps every 1 1/2 hours
Suresh Srinivas 2012-12-23, 03:23
This looks to me is because of larger default young generation size in newer java releases - see http://docs.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html#heap_size. I can see looking at your GC logs, around 6G space being used for young generation (though I do not see logs related to minor collection). That means for the same given number of objects, you have smaller old generation space and hence old generation collection can no longer perform well. It is unfortunate that such changes are made in java and that causes previously working applications to fail. My suggestion is to not depend on default young generation sizes any more. At large JVM sizes, the defaults chosen by the JDK no longer works well. So I suggest protecting yourself from such changes by explicitly specifying young generation size. Given my experience of tuning GC at Yahoo clusters, at the number of objects you have and total heap size you are allocating, I suggest setting the young generation to 1G. You can do that by adding -XX:NewSize=1G -XX:MaxNewSize=1G Let me know how it goes. On Sat, Dec 22, 2012 at 5:59 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote: > 6333.934: [Full GC 10391746K->9722532K(17194656K), 63.9812940 secs] > -- http://hortonworks.com/download/
+
Suresh Srinivas 2012-12-23, 03:23
-
Re: NN Memory Jumps every 1 1/2 hours
Edward Capriolo 2012-12-23, 18:34
Tried this.. NameNode is still Ruining my Xmas on its slow death march to OOM. http://imagebin.org/240453On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas <[EMAIL PROTECTED]>wrote: > -XX:NewSize=1G -XX:MaxNewSize=1G
+
Edward Capriolo 2012-12-23, 18:34
-
Re: NN Memory Jumps every 1 1/2 hours
Joep Rottinghuis 2012-12-23, 19:00
Do you have audit logs from before and after to compare? Are there some surprising access patterns you can discern? Joep Sent from my iPhone On Dec 23, 2012, at 10:34 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > Tried this.. > > NameNode is still Ruining my Xmas on its slow death march to OOM. > > http://imagebin.org/240453> > > On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas <[EMAIL PROTECTED]>wrote: > >> -XX:NewSize=1G -XX:MaxNewSize=1G
+
Joep Rottinghuis 2012-12-23, 19:00
-
Re: NN Memory Jumps every 1 1/2 hours
Suresh Srinivas 2012-12-24, 02:40
Do not have access to my computer. Based on reading the previous email, I do not see any thing suspicious on the list of objects in the histo live dump. I would like to hear from you about if it continued to grow. One instance of this I had seen in the past was related to weak reference related to socket objects. I do not see that happening here though. Sent from phone On Dec 23, 2012, at 10:34 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > Tried this.. > > NameNode is still Ruining my Xmas on its slow death march to OOM. > > http://imagebin.org/240453> > > On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas <[EMAIL PROTECTED]>wrote: > >> -XX:NewSize=1G -XX:MaxNewSize=1G
+
Suresh Srinivas 2012-12-24, 02:40
-
Re: NN Memory Jumps every 1 1/2 hours
Edward Capriolo 2012-12-27, 21:48
So it turns out the issue was just the size of the filesystem. 2012-12-27 16:37:22,390 WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done. New Image Size: 4,354,340,042 Basically if the NN image size hits ~ 5,000,000,000 you get f'ed. So you need about 3x ram as your FSImage size. If you do not have enough you die a slow death. On Sun, Dec 23, 2012 at 9:40 PM, Suresh Srinivas <[EMAIL PROTECTED]>wrote: > Do not have access to my computer. Based on reading the previous email, I > do not see any thing suspicious on the list of objects in the histo live > dump. > > I would like to hear from you about if it continued to grow. One instance > of this I had seen in the past was related to weak reference related to > socket objects. I do not see that happening here though. > > Sent from phone > > On Dec 23, 2012, at 10:34 AM, Edward Capriolo <[EMAIL PROTECTED]> > wrote: > > > Tried this.. > > > > NameNode is still Ruining my Xmas on its slow death march to OOM. > > > > http://imagebin.org/240453> > > > > > On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas < > [EMAIL PROTECTED]>wrote: > > > >> -XX:NewSize=1G -XX:MaxNewSize=1G >
+
Edward Capriolo 2012-12-27, 21:48
-
Re: NN Memory Jumps every 1 1/2 hours
Suresh Srinivas 2012-12-27, 22:08
You did free up lot of old generation with reducing young generation, right? The extra 5G of RAM for the old generation should have helped. Based on my calculation, for the current number of objects you have, you need roughly: 12G of total heap with young generation size of 1G. This assumes the average file name size is 32 bytes. In later releases (>= 0.20.204), several memory optimization and startup optimizations have been done. It should help you as well. On Thu, Dec 27, 2012 at 1:48 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote: > So it turns out the issue was just the size of the filesystem. > 2012-12-27 16:37:22,390 WARN > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done. > New Image Size: 4,354,340,042 > > Basically if the NN image size hits ~ 5,000,000,000 you get f'ed. So you > need about 3x ram as your FSImage size. If you do not have enough you die a > slow death. > > On Sun, Dec 23, 2012 at 9:40 PM, Suresh Srinivas <[EMAIL PROTECTED] > >wrote: > > > Do not have access to my computer. Based on reading the previous email, I > > do not see any thing suspicious on the list of objects in the histo live > > dump. > > > > I would like to hear from you about if it continued to grow. One instance > > of this I had seen in the past was related to weak reference related to > > socket objects. I do not see that happening here though. > > > > Sent from phone > > > > On Dec 23, 2012, at 10:34 AM, Edward Capriolo <[EMAIL PROTECTED]> > > wrote: > > > > > Tried this.. > > > > > > NameNode is still Ruining my Xmas on its slow death march to OOM. > > > > > > http://imagebin.org/240453> > > > > > > > > On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas < > > [EMAIL PROTECTED]>wrote: > > > > > >> -XX:NewSize=1G -XX:MaxNewSize=1G > > > -- http://hortonworks.com/download/
+
Suresh Srinivas 2012-12-27, 22:08
-
Re: NN Memory Jumps every 1 1/2 hours
Edward Capriolo 2012-12-27, 22:22
I am not sure GC had a factor. Even when I forced a GC it cleared 0% memory. One would think that since the entire NameNode image is stored in memory that the heap would not need to grow beyond that, but that sure does not seem to be the case. a 5GB image starts off using 10GB of memory and after "burn in" it seems to use about 15GB memory. So really we say "the name node data has to fit in memory" but what we really mean is "the name node data must fit in memory 3x" On Thu, Dec 27, 2012 at 5:08 PM, Suresh Srinivas <[EMAIL PROTECTED]>wrote: > You did free up lot of old generation with reducing young generation, > right? The extra 5G of RAM for the old generation should have helped. > > Based on my calculation, for the current number of objects you have, you > need roughly: > 12G of total heap with young generation size of 1G. This assumes the > average file name size is 32 bytes. > > In later releases (>= 0.20.204), several memory optimization and startup > optimizations have been done. It should help you as well. > > > > On Thu, Dec 27, 2012 at 1:48 PM, Edward Capriolo <[EMAIL PROTECTED] > >wrote: > > > So it turns out the issue was just the size of the filesystem. > > 2012-12-27 16:37:22,390 WARN > > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint > done. > > New Image Size: 4,354,340,042 > > > > Basically if the NN image size hits ~ 5,000,000,000 you get f'ed. So you > > need about 3x ram as your FSImage size. If you do not have enough you > die a > > slow death. > > > > On Sun, Dec 23, 2012 at 9:40 PM, Suresh Srinivas <[EMAIL PROTECTED] > > >wrote: > > > > > Do not have access to my computer. Based on reading the previous > email, I > > > do not see any thing suspicious on the list of objects in the histo > live > > > dump. > > > > > > I would like to hear from you about if it continued to grow. One > instance > > > of this I had seen in the past was related to weak reference related to > > > socket objects. I do not see that happening here though. > > > > > > Sent from phone > > > > > > On Dec 23, 2012, at 10:34 AM, Edward Capriolo <[EMAIL PROTECTED]> > > > wrote: > > > > > > > Tried this.. > > > > > > > > NameNode is still Ruining my Xmas on its slow death march to OOM. > > > > > > > > http://imagebin.org/240453> > > > > > > > > > > > On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas < > > > [EMAIL PROTECTED]>wrote: > > > > > > > >> -XX:NewSize=1G -XX:MaxNewSize=1G > > > > > > > > > -- > http://hortonworks.com/download/>
+
Edward Capriolo 2012-12-27, 22:22
-
Re: NN Memory Jumps every 1 1/2 hours
Suresh Srinivas 2012-12-27, 22:41
I do not follow what you mean here. > Even when I forced a GC it cleared 0% memory. Is this with new younggen setting? Because earlier, based on the calculation I posted, you need ~11G in old generation. With 6G as the default younggen size, you actually had just enough memory to fit the namespace in oldgen. Hence you might not have seen Full GC freeing up enough memory. Have you tried Full GC with 1G youngen size have you tried this? I supsect you would see lot more memory freeing up. > One would think that since the entire NameNode image is stored in memory that the heap would not need to grow beyond that Namenode image that you see during checkpointing is the size of file written after serializing file system namespace in memory. This is not what is directly stored in namenode memory. Namenode stores data structures that corresponds to file system directory tree and block locations. Out of this only file system directory is serialized and written to fsimage. Blocks locations are not. On Thu, Dec 27, 2012 at 2:22 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote: > I am not sure GC had a factor. Even when I forced a GC it cleared 0% > memory. One would think that since the entire NameNode image is stored in > memory that the heap would not need to grow beyond that, but that sure does > not seem to be the case. a 5GB image starts off using 10GB of memory and > after "burn in" it seems to use about 15GB memory. > > So really we say "the name node data has to fit in memory" but what we > really mean is "the name node data must fit in memory 3x" > > On Thu, Dec 27, 2012 at 5:08 PM, Suresh Srinivas <[EMAIL PROTECTED] > >wrote: > > > You did free up lot of old generation with reducing young generation, > > right? The extra 5G of RAM for the old generation should have helped. > > > > Based on my calculation, for the current number of objects you have, you > > need roughly: > > 12G of total heap with young generation size of 1G. This assumes the > > average file name size is 32 bytes. > > > > In later releases (>= 0.20.204), several memory optimization and startup > > optimizations have been done. It should help you as well. > > > > > > > > On Thu, Dec 27, 2012 at 1:48 PM, Edward Capriolo <[EMAIL PROTECTED] > > >wrote: > > > > > So it turns out the issue was just the size of the filesystem. > > > 2012-12-27 16:37:22,390 WARN > > > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint > > done. > > > New Image Size: 4,354,340,042 > > > > > > Basically if the NN image size hits ~ 5,000,000,000 you get f'ed. So > you > > > need about 3x ram as your FSImage size. If you do not have enough you > > die a > > > slow death. > > > > > > On Sun, Dec 23, 2012 at 9:40 PM, Suresh Srinivas < > [EMAIL PROTECTED] > > > >wrote: > > > > > > > Do not have access to my computer. Based on reading the previous > > email, I > > > > do not see any thing suspicious on the list of objects in the histo > > live > > > > dump. > > > > > > > > I would like to hear from you about if it continued to grow. One > > instance > > > > of this I had seen in the past was related to weak reference related > to > > > > socket objects. I do not see that happening here though. > > > > > > > > Sent from phone > > > > > > > > On Dec 23, 2012, at 10:34 AM, Edward Capriolo <[EMAIL PROTECTED] > > > > > > wrote: > > > > > > > > > Tried this.. > > > > > > > > > > NameNode is still Ruining my Xmas on its slow death march to OOM. > > > > > > > > > > http://imagebin.org/240453> > > > > > > > > > > > > > > On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas < > > > > [EMAIL PROTECTED]>wrote: > > > > > > > > > >> -XX:NewSize=1G -XX:MaxNewSize=1G > > > > > > > > > > > > > > > -- > > http://hortonworks.com/download/> > > -- http://hortonworks.com/download/
+
Suresh Srinivas 2012-12-27, 22:41
-
Re: NN Memory Jumps every 1 1/2 hours
Edward Capriolo 2012-12-27, 22:58
I tried your suggested setting and forced GC from Jconsole and once it crept up nothing was freeing up.
So just food for thought:
You said "average file name size is 32 bytes". Well most of my data sits in
/user/hive/warehouse/ Then I have a tables with partitions.
Does it make sense to just move this to "/u/h/w"?
Will I be saving 400,000,000 bytes of memory if I do? On Thu, Dec 27, 2012 at 5:41 PM, Suresh Srinivas <[EMAIL PROTECTED]>wrote:
> I do not follow what you mean here. > > > Even when I forced a GC it cleared 0% memory. > Is this with new younggen setting? Because earlier, based on the > calculation I posted, you need ~11G in old generation. With 6G as the > default younggen size, you actually had just enough memory to fit the > namespace in oldgen. Hence you might not have seen Full GC freeing up > enough memory. > > Have you tried Full GC with 1G youngen size have you tried this? I supsect > you would see lot more memory freeing up. > > > One would think that since the entire NameNode image is stored in memory > that the heap would not need to grow beyond that > Namenode image that you see during checkpointing is the size of file > written after serializing file system namespace in memory. This is not what > is directly stored in namenode memory. Namenode stores data structures that > corresponds to file system directory tree and block locations. Out of this > only file system directory is serialized and written to fsimage. Blocks > locations are not. > > > > > On Thu, Dec 27, 2012 at 2:22 PM, Edward Capriolo <[EMAIL PROTECTED] > >wrote: > > > I am not sure GC had a factor. Even when I forced a GC it cleared 0% > > memory. One would think that since the entire NameNode image is stored in > > memory that the heap would not need to grow beyond that, but that sure > does > > not seem to be the case. a 5GB image starts off using 10GB of memory and > > after "burn in" it seems to use about 15GB memory. > > > > So really we say "the name node data has to fit in memory" but what we > > really mean is "the name node data must fit in memory 3x" > > > > On Thu, Dec 27, 2012 at 5:08 PM, Suresh Srinivas <[EMAIL PROTECTED] > > >wrote: > > > > > You did free up lot of old generation with reducing young generation, > > > right? The extra 5G of RAM for the old generation should have helped. > > > > > > Based on my calculation, for the current number of objects you have, > you > > > need roughly: > > > 12G of total heap with young generation size of 1G. This assumes the > > > average file name size is 32 bytes. > > > > > > In later releases (>= 0.20.204), several memory optimization and > startup > > > optimizations have been done. It should help you as well. > > > > > > > > > > > > On Thu, Dec 27, 2012 at 1:48 PM, Edward Capriolo < > [EMAIL PROTECTED] > > > >wrote: > > > > > > > So it turns out the issue was just the size of the filesystem. > > > > 2012-12-27 16:37:22,390 WARN > > > > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint > > > done. > > > > New Image Size: 4,354,340,042 > > > > > > > > Basically if the NN image size hits ~ 5,000,000,000 you get f'ed. So > > you > > > > need about 3x ram as your FSImage size. If you do not have enough you > > > die a > > > > slow death. > > > > > > > > On Sun, Dec 23, 2012 at 9:40 PM, Suresh Srinivas < > > [EMAIL PROTECTED] > > > > >wrote: > > > > > > > > > Do not have access to my computer. Based on reading the previous > > > email, I > > > > > do not see any thing suspicious on the list of objects in the histo > > > live > > > > > dump. > > > > > > > > > > I would like to hear from you about if it continued to grow. One > > > instance > > > > > of this I had seen in the past was related to weak reference > related > > to > > > > > socket objects. I do not see that happening here though. > > > > > > > > > > Sent from phone > > > > > > > > > > On Dec 23, 2012, at 10:34 AM, Edward Capriolo < > [EMAIL PROTECTED] > > > > > > > > wrote: > > > > >
+
Edward Capriolo 2012-12-27, 22:58
-
Re: NN Memory Jumps every 1 1/2 hours
Suresh Srinivas 2012-12-27, 23:12
> > I tried your suggested setting and forced GC from Jconsole and once it > crept up nothing was freeing up. > That is very surprising. If possible, take a live dump when namenode starts up (when memory used is low) and when namenode memory consumption has gone up considerably, closer to the heap limit. BTW, are you running with that configuration - with younggen size set to smaller size? > > So just food for thought: > > You said "average file name size is 32 bytes". Well most of my data sits in > > /user/hive/warehouse/ > Then I have a tables with partitions. > > Does it make sense to just move this to "/u/h/w"? > In the directory structure in the namenode memory, there is one inode for user, hive and warehouse. So it would save only couple of bytes. However on fsimage in older releases, /user/hive/warehouse is repeated for every file. This in the later release has been optimized. But these optimizations affect only the fsimage and not the memory consumed on the namenode. > > Will I be saving 400,000,000 bytes of memory if I do? > On Thu, Dec 27, 2012 at 5:41 PM, Suresh Srinivas <[EMAIL PROTECTED] > >wrote: > > > I do not follow what you mean here. > > > > > Even when I forced a GC it cleared 0% memory. > > Is this with new younggen setting? Because earlier, based on the > > calculation I posted, you need ~11G in old generation. With 6G as the > > default younggen size, you actually had just enough memory to fit the > > namespace in oldgen. Hence you might not have seen Full GC freeing up > > enough memory. > > > > Have you tried Full GC with 1G youngen size have you tried this? I > supsect > > you would see lot more memory freeing up. > > > > > One would think that since the entire NameNode image is stored in > memory > > that the heap would not need to grow beyond that > > Namenode image that you see during checkpointing is the size of file > > written after serializing file system namespace in memory. This is not > what > > is directly stored in namenode memory. Namenode stores data structures > that > > corresponds to file system directory tree and block locations. Out of > this > > only file system directory is serialized and written to fsimage. Blocks > > locations are not. > > > > > > > > > > On Thu, Dec 27, 2012 at 2:22 PM, Edward Capriolo <[EMAIL PROTECTED] > > >wrote: > > > > > I am not sure GC had a factor. Even when I forced a GC it cleared 0% > > > memory. One would think that since the entire NameNode image is stored > in > > > memory that the heap would not need to grow beyond that, but that sure > > does > > > not seem to be the case. a 5GB image starts off using 10GB of memory > and > > > after "burn in" it seems to use about 15GB memory. > > > > > > So really we say "the name node data has to fit in memory" but what we > > > really mean is "the name node data must fit in memory 3x" > > > > > > On Thu, Dec 27, 2012 at 5:08 PM, Suresh Srinivas < > [EMAIL PROTECTED] > > > >wrote: > > > > > > > You did free up lot of old generation with reducing young generation, > > > > right? The extra 5G of RAM for the old generation should have helped. > > > > > > > > Based on my calculation, for the current number of objects you have, > > you > > > > need roughly: > > > > 12G of total heap with young generation size of 1G. This assumes the > > > > average file name size is 32 bytes. > > > > > > > > In later releases (>= 0.20.204), several memory optimization and > > startup > > > > optimizations have been done. It should help you as well. > > > > > > > > > > > > > > > > On Thu, Dec 27, 2012 at 1:48 PM, Edward Capriolo < > > [EMAIL PROTECTED] > > > > >wrote: > > > > > > > > > So it turns out the issue was just the size of the filesystem. > > > > > 2012-12-27 16:37:22,390 WARN > > > > > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: > Checkpoint > > > > done. > > > > > New Image Size: 4,354,340,042 > > > > > > > > > > Basically if the NN image size hits ~ 5,000,000,000 you get f'ed. > So > > > you http://hortonworks.com/download/
+
Suresh Srinivas 2012-12-27, 23:12
|
|