Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - NN Memory Jumps every 1 1/2 hours


+
Edward Capriolo 2012-12-22, 04:24
+
Adam Faris 2012-12-22, 04:59
+
Edward Capriolo 2012-12-22, 12:54
+
Michael Segel 2012-12-22, 15:42
+
Joep Rottinghuis 2012-12-22, 17:17
Copy link to this message
-
Re: NN Memory Jumps every 1 1/2 hours
Edward Capriolo 2012-12-22, 17:51
Newer 1.6 are getting close to 1.7 so I am not going to fear a number and
fight the future.

I have been aat around 27 million files for a while been as high as 30
million I do not think that is related.

I do not think it is related to checkpoints but I am considering
raising/lowering the checkpoint triggers.

On Saturday, December 22, 2012, Joep Rottinghuis <[EMAIL PROTECTED]>
wrote:
> Do your OOMs correlate with the secondary checkpointing?
>
> Joep
>
> Sent from my iPhone
>
> On Dec 22, 2012, at 7:42 AM, Michael Segel <[EMAIL PROTECTED]>
wrote:
>
>> Hey Silly question...
>>
>> How long have you had 27 million files?
>>
>> I mean can you correlate the number of files to the spat of OOMs?
>>
>> Even without problems... I'd say it would be a good idea to upgrade due
to the probability of a lot of code fixes...
>>
>> If you're running anything pre 1.x, going to 1.7 java wouldn't be a good
idea.  Having said that... outside of MapR, have any of the distros
certified themselves on 1.7 yet?
>>
>> On Dec 22, 2012, at 6:54 AM, Edward Capriolo <[EMAIL PROTECTED]>
wrote:
>>
>>> I will give this a go. I have actually went in JMX and manually
triggered
>>> GC no memory is returned. So I assumed something was leaking.
>>>
>>> On Fri, Dec 21, 2012 at 11:59 PM, Adam Faris <[EMAIL PROTECTED]>
wrote:
>>>
>>>> I know this will sound odd, but try reducing your heap size.   We had
an
>>>> issue like this where GC kept falling behind and we either ran out of
heap
>>>> or would be in full gc.  By reducing heap, we were forcing concurrent
mark
>>>> sweep to occur and avoided both full GC and running out of heap space
as
>>>> the JVM would collect objects more frequently.
>>>>
>>>> On Dec 21, 2012, at 8:24 PM, Edward Capriolo <[EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>>> I have an old hadoop 0.20.2 cluster. Have not had any issues for a
while.
>>>>> (which is why I never bothered an upgrade)
>>>>>
>>>>> Suddenly it OOMed last week. Now the OOMs happen periodically. We
have a
>>>>> fairly large NameNode heap Xmx 17GB. It is a fairly large FS about
>>>>> 27,000,000 files.
>>>>>
>>>>> So the strangest thing is that every 1 and 1/2 hour the NN memory
usage
>>>>> increases until the heap is full.
>>>>>
>>>>> http://imagebin.org/240287
>>>>>
>>>>> We tried failing over the NN to another machine. We change the Java
>>>> version
>>>>> from 1.6_23 -> 1.7.0.
>>>>>
>>>>> I have set the NameNode logs to debug and ALL and I have done the same
>>>> with
>>>>> the data nodes.
>>>>> Secondary NN is running and shipping edits and making new images.
>>>>>
>>>>> I am thinking something has corrupted the NN MetaData and after enough
>>>> time
>>>>> it becomes a time bomb, but this is just a total shot in the dark.
Does
>>>>> anyone have any interesting trouble shooting ideas?
>>
>
+
Suresh Srinivas 2012-12-22, 18:32
+
Edward Capriolo 2012-12-23, 00:03
+
Edward Capriolo 2012-12-23, 01:59
+
Suresh Srinivas 2012-12-23, 03:23
+
Edward Capriolo 2012-12-23, 18:34
+
Joep Rottinghuis 2012-12-23, 19:00
+
Suresh Srinivas 2012-12-24, 02:40
+
Edward Capriolo 2012-12-27, 21:48
+
Suresh Srinivas 2012-12-27, 22:08
+
Edward Capriolo 2012-12-27, 22:22
+
Suresh Srinivas 2012-12-27, 22:41
+
Edward Capriolo 2012-12-27, 22:58
+
Suresh Srinivas 2012-12-27, 23:12