Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Namenode hangs on startup


Copy link to this message
-
Re: Namenode hangs on startup
Hi Harsh,

The web portal of the NN shows 0 nodes.

Looking into each node's log, all nodes but one have been staying at:

/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = pipeline09.x.y.z/10.2.20.109
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.205.0
STARTUP_MSG:   build https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-205
-r 1179940; compiled by 'hortonfo' on Fri Oct  7 06:20:32 UTC 2011
************************************************************/
2012-07-02 11:00:38,891 INFO
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
hadoop-metrics2.properties
2012-07-02 11:00:38,906 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
MetricsSystem,sub=Stats registered.
2012-07-02 11:00:38,908 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
period at 10 second(s).
2012-07-02 11:00:38,908 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics
system started
2012-07-02 11:00:39,091 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
ugi registered.
2012-07-02 11:00:39,094 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi
already exists!
============================
The one node passed that and reached:

2012-07-02 12:24:58,450 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-2932980158620691384_129517

Does that mean progress? It's been 1.5 hour since the start. And the
file system side is: 74515 files and directories, 28439 blocks 102954 total

Thanks for  helping.
James

On Mon, Jul 2, 2012 at 12:07 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Juanhui,
>
> It is merely waiting for the DNs to start, and to report its blocks
> in. This does not take long once the DNs are up and running. Do you
> see any Live Nodes yet?
>
> On Tue, Jul 3, 2012 at 12:34 AM, Jianhui Zhang <[EMAIL PROTECTED]> wrote:
>> Hi folks,
>>
>> Thanks for helping, especially at such earlier hours.
>>
>> After leaving it overnight, during which period nothing happened in
>> the log, I restarted this morning. This time, it passed the previously
>> stuck point, and reached all the way to "IPC Server handler..
>> starting", in Safe Mode. So it looks more promising now.
>>
>> But it's in a state of:
>>
>> "The ratio of reported blocks 0.0000 has not reached the threshold
>> 0.9990. Safe mode will be turned off automatically."
>>
>> Does that mean the NN is waiting for DNs's communications/updates? How
>> can I tell whether it's stuck or just slow?
>>
>> The NN log is at: http://pastebin.com/5fvRfRSD
>>
>> The jstack output is at: http://pastebin.com/RnDXWrtc
>>
>> The configurations are really basic:
>>
>> core-site.xml:
>>
>> <configuration>
>>   <property>
>>     <name>fs.default.name</name>
>>     <value>hdfs://pipeline-hdnn01-virtual.x.y.z:8020</value>
>>     <final>true</final>
>>   </property>
>>   <property>
>>     <name>io.file.buffer.size</name>
>>     <value>65536</value>
>>   </property>
>> </configuration>
>>
>> It's the same for all nodes.
>>
>> Again, appreciate your help.
>>
>> Thanks,
>> James
>>
>> On Mon, Jul 2, 2012 at 3:21 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>> Jianhui,
>>>
>>> Can you pastebin.com the output of your "jstack <NN PID>" command
>>> after its hung, and pass us the paste link please? It looks to me like
>>> it may have just been merging/saving the image, and that may be slow
>>> but it depends on how long did you have to wait around to see NN
>>> resume and begin properly.
>>>
>>> On Mon, Jul 2, 2012 at 2:34 PM, Jianhui Zhang <[EMAIL PROTECTED]> wrote:
>>>> Hi,
>>>>
>>>> Apache Hadoop 0.20.205.
>>>>
>>>> I'm trying to restart NN and it always hangs at the very beginning.
>>>> The only logs I've got are:
>>>>
>>>> /************************************************************
>>>> STARTUP_MSG: Starting NameNode
>>>> STARTUP_MSG:   host = host/ip
>