Further info: From the log of secondary NN:
INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted
URL
AAA.xxx.yyy.zzz:50070putimage=1&port=50090&machine=0.0.0.0&token=-32:1245372967:0:1343222881000:1343222711330
ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
Exception in doCheckpoint:
ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.io.FileNotFoundException:
http://AAA.xxx.yyy.zzz:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:1245372967:0:1343222881000:1343222711330Where Primary NN is AAA.xxx.yyy.zzz.
The 0.0.0.0 embedded in the URL looks suspicious, but I have no idea
what file it is telling me it is missing.
On 08/15/2012 12:39 PM, Terry Healy wrote:
> This situation continues and reports every 10 minutes. I even tried
> moving the secondary NN function to a different node. I have run TCPDump
> but cannot isolate the "Connection Refused" issue. Am I correct in
> assuming that the NN will try to connect to SNN on port 50090?
>
> Any way out? Current partial dump below.
>
> 2012-08-15 12:36:09,422 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
> xxx.yyy.254.254
>
> 2012-08-15 12:36:09,423 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit
> log, edits.new files already exists in all healthy directories:
> /home/thealy/hdfs/name/current/edits.new
>
> 2012-08-15 12:36:09,879 ERROR
> org.apache.hadoop.security.UserGroupInformation:
> PriviledgedActionException as:thealy cause:java.net.ConnectException:
> Connection refused
>
> 2012-08-15 12:36:09,879 ERROR
> org.apache.hadoop.security.UserGroupInformation:
> PriviledgedActionException as:thealy cause:java.net.ConnectException:
> Connection refused
>
> 2012-08-15 12:36:09,881 WARN org.mortbay.log: /getimage:
> java.io.IOException: GetImage failed. java.net.ConnectException:
> Connection refused
>
>
>
> On 07/09/2012 01:23 PM, Terry Healy wrote:
>> Any suggestions on how to clear the error? stp-all / start-all had no
>> effect.
>>
>> On 07/03/2012 04:22 PM, Brandon Li wrote:
>>> With 1.0.2, only one checkpoint process is executed at a time. When the
>>> namenode gets an overlapping checkpointing request, it checks edit.new
>>> in its storage directories. If all of them have this file, namenode
>>> concludes the previous checkpoint process is not done yet and prints the
>>> warning message you've seen.
>>>
>>> Brandon
>>>
>>> On Tue, Jul 3, 2012 at 10:56 AM, Terry Healy <[EMAIL PROTECTED]
>>> <mailto:[EMAIL PROTECTED]>> wrote:
>>>
>>> Running Apache 1.0.2.
>>>
>>> The NN log is reporting that it cannot "roll the edit log" from the
>>> secondary NN. The SecondaryNameNode is running on the system referred to
>>> as xxx.yyy.254.238 in the log snippet below.
>>>
>>> From the NN, I can connect to the Secondary via ssh as the user. Any
>>> suggestions what have I got wrong here?
>>>
>>> thanks,
>>>
>>> Terry
>>>
>>> INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
>>> from xxx.yyy.254.238
>>> WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll
>>> edit log, edits.new files already exists in all healthy directories:
>>> /home/[user]/hdfs/name/current/edits.new
>>>
>>> ERROR org.apache.hadoop.security.UserGroupInformation:
>>> PriviledgedActionException as:[user] cause:java.net.ConnectException:
>>> Connection refused
>>> ERROR org.apache.hadoop.security.UserGroupInformation:
>>> PriviledgedActionException as:[user] cause:java.net.ConnectException:
>>> Connection refused
>>>
>>> WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed.
>>> java.net.ConnectException: Connection refused
>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
>>> ....
>>>
>>>
>>> --
>>> Terry Healy / [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
Terry Healy / [EMAIL PROTECTED]
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973