Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> NN Memory Jumps every 1 1/2 hours

Copy link to this message
Re: NN Memory Jumps every 1 1/2 hours
Please take a histo live dump when the memory is full. Note that this causes full gc.

What are the number of blocks you have on the system.

Send the JVM options you are using. From earlier java versions which used 1/8 of total heap for young gen, it has gone upto 1/3 of total heap. This could also be the reason.

Do you collect gc logs? Send that as well.

Sent from a mobile device

On Dec 22, 2012, at 9:51 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote:

> Newer 1.6 are getting close to 1.7 so I am not going to fear a number and
> fight the future.
> I have been aat around 27 million files for a while been as high as 30
> million I do not think that is related.
> I do not think it is related to checkpoints but I am considering
> raising/lowering the checkpoint triggers.
> On Saturday, December 22, 2012, Joep Rottinghuis <[EMAIL PROTECTED]>
> wrote:
>> Do your OOMs correlate with the secondary checkpointing?
>> Joep
>> Sent from my iPhone
>> On Dec 22, 2012, at 7:42 AM, Michael Segel <[EMAIL PROTECTED]>
> wrote:
>>> Hey Silly question...
>>> How long have you had 27 million files?
>>> I mean can you correlate the number of files to the spat of OOMs?
>>> Even without problems... I'd say it would be a good idea to upgrade due
> to the probability of a lot of code fixes...
>>> If you're running anything pre 1.x, going to 1.7 java wouldn't be a good
> idea.  Having said that... outside of MapR, have any of the distros
> certified themselves on 1.7 yet?
>>> On Dec 22, 2012, at 6:54 AM, Edward Capriolo <[EMAIL PROTECTED]>
> wrote:
>>>> I will give this a go. I have actually went in JMX and manually
> triggered
>>>> GC no memory is returned. So I assumed something was leaking.
>>>> On Fri, Dec 21, 2012 at 11:59 PM, Adam Faris <[EMAIL PROTECTED]>
> wrote:
>>>>> I know this will sound odd, but try reducing your heap size.   We had
> an
>>>>> issue like this where GC kept falling behind and we either ran out of
> heap
>>>>> or would be in full gc.  By reducing heap, we were forcing concurrent
> mark
>>>>> sweep to occur and avoided both full GC and running out of heap space
> as
>>>>> the JVM would collect objects more frequently.
>>>>> On Dec 21, 2012, at 8:24 PM, Edward Capriolo <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>>> I have an old hadoop 0.20.2 cluster. Have not had any issues for a
> while.
>>>>>> (which is why I never bothered an upgrade)
>>>>>> Suddenly it OOMed last week. Now the OOMs happen periodically. We
> have a
>>>>>> fairly large NameNode heap Xmx 17GB. It is a fairly large FS about
>>>>>> 27,000,000 files.
>>>>>> So the strangest thing is that every 1 and 1/2 hour the NN memory
> usage
>>>>>> increases until the heap is full.
>>>>>> http://imagebin.org/240287
>>>>>> We tried failing over the NN to another machine. We change the Java
>>>>> version
>>>>>> from 1.6_23 -> 1.7.0.
>>>>>> I have set the NameNode logs to debug and ALL and I have done the same
>>>>> with
>>>>>> the data nodes.
>>>>>> Secondary NN is running and shipping edits and making new images.
>>>>>> I am thinking something has corrupted the NN MetaData and after enough
>>>>> time
>>>>>> it becomes a time bomb, but this is just a total shot in the dark.
> Does
>>>>>> anyone have any interesting trouble shooting ideas?