Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Namenode hangs on startup


Copy link to this message
-
Re: Namenode hangs on startup
Hi Harsh,

The web portal of the NN shows 0 nodes.

Looking into each node's log, all nodes but one have been staying at:

/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = pipeline09.x.y.z/10.2.20.109
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.205.0
STARTUP_MSG:   build https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-205
-r 1179940; compiled by 'hortonfo' on Fri Oct  7 06:20:32 UTC 2011
************************************************************/
2012-07-02 11:00:38,891 INFO
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
hadoop-metrics2.properties
2012-07-02 11:00:38,906 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
MetricsSystem,sub=Stats registered.
2012-07-02 11:00:38,908 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
period at 10 second(s).
2012-07-02 11:00:38,908 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics
system started
2012-07-02 11:00:39,091 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
ugi registered.
2012-07-02 11:00:39,094 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi
already exists!
============================
The one node passed that and reached:

2012-07-02 12:24:58,450 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-2932980158620691384_129517

Does that mean progress? It's been 1.5 hour since the start. And the
file system side is: 74515 files and directories, 28439 blocks 102954 total

Thanks for  helping.
James

On Mon, Jul 2, 2012 at 12:07 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Juanhui,
>
> It is merely waiting for the DNs to start, and to report its blocks
> in. This does not take long once the DNs are up and running. Do you
> see any Live Nodes yet?
>
> On Tue, Jul 3, 2012 at 12:34 AM, Jianhui Zhang <[EMAIL PROTECTED]> wrote:
>> Hi folks,
>>
>> Thanks for helping, especially at such earlier hours.
>>
>> After leaving it overnight, during which period nothing happened in
>> the log, I restarted this morning. This time, it passed the previously
>> stuck point, and reached all the way to "IPC Server handler..
>> starting", in Safe Mode. So it looks more promising now.
>>
>> But it's in a state of:
>>
>> "The ratio of reported blocks 0.0000 has not reached the threshold
>> 0.9990. Safe mode will be turned off automatically."
>>
>> Does that mean the NN is waiting for DNs's communications/updates? How
>> can I tell whether it's stuck or just slow?
>>
>> The NN log is at: http://pastebin.com/5fvRfRSD
>>
>> The jstack output is at: http://pastebin.com/RnDXWrtc
>>
>> The configurations are really basic:
>>
>> core-site.xml:
>>
>> <configuration>
>>   <property>
>>     <name>fs.default.name</name>
>>     <value>hdfs://pipeline-hdnn01-virtual.x.y.z:8020</value>
>>     <final>true</final>
>>   </property>
>>   <property>
>>     <name>io.file.buffer.size</name>
>>     <value>65536</value>
>>   </property>
>> </configuration>
>>
>> It's the same for all nodes.
>>
>> Again, appreciate your help.
>>
>> Thanks,
>> James
>>
>> On Mon, Jul 2, 2012 at 3:21 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>> Jianhui,
>>>
>>> Can you pastebin.com the output of your "jstack <NN PID>" command
>>> after its hung, and pass us the paste link please? It looks to me like
>>> it may have just been merging/saving the image, and that may be slow
>>> but it depends on how long did you have to wait around to see NN
>>> resume and begin properly.
>>>
>>> On Mon, Jul 2, 2012 at 2:34 PM, Jianhui Zhang <[EMAIL PROTECTED]> wrote:
>>>> Hi,
>>>>
>>>> Apache Hadoop 0.20.205.
>>>>
>>>> I'm trying to restart NN and it always hangs at the very beginning.
>>>> The only logs I've got are:
>>>>
>>>> /************************************************************
>>>> STARTUP_MSG: Starting NameNode
>>>> STARTUP_MSG:   host = host/ip
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB