Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Namenode hangs on startup


+
Jianhui Zhang 2012-07-02, 09:04
+
Harsh J 2012-07-02, 10:21
+
Jianhui Zhang 2012-07-02, 19:04
+
Harsh J 2012-07-02, 19:07
+
Jianhui Zhang 2012-07-02, 19:32
+
David Rosenstrauch 2012-07-02, 19:56
Copy link to this message
-
Re: Namenode hangs on startup
Thanks for helping.

I do have a secondary namenode.

And I'm starting one datanode at a time now. Previously I was using
start-dfs.sh to start NN and slaves. I observed pretty strange
behaviors:

1. If use start-dfs.sh, almost none of the DN joined. All seemed stuck.

2. If I start one DN at a time, SOME joined instantly, but some looked
hanging... The funny thing is: if I start over again, the behavior
changed for each node....

Any idea?

Thanks,
James
On Mon, Jul 2, 2012 at 12:56 PM, David Rosenstrauch <[EMAIL PROTECTED]> wrote:
> A couple of thoughts:
>
> 1) It shouldn't take more than a couple of minutes for the data nodes to
> re-register after the name node comes back up.  If the data nodes aren't
> registering, you might try restarting the data node daemons on each machine.
> They should all register soon after.  (If not, then you have other problems
> involved.)
>
> 2) Re: the original symptom you reported about the namenode "hanging" at
> startup:  you should read up on the "secondary namenode" component in Hadoop
> (you might start at
> http://hadoop.apache.org/common/docs/r0.17.2/hdfs_user_guide.html#Secondary+Namenode
> and
> http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/).
> If you don't have a secondary namenode running in your cluster, that can
> have a significantly negative impact on namenode startup time.  (The
> namenode needs to replay the log at startup - which can take a long time if
> the log has gotten large.)
>
> HTH,
>
> DR
>
>
> On 07/02/2012 03:32 PM, Jianhui Zhang wrote:
>>
>> Hi Harsh,
>>
>> The web portal of the NN shows 0 nodes.
>>
>> Looking into each node's log, all nodes but one have been staying at:
>>
>> /************************************************************
>> STARTUP_MSG: Starting DataNode
>> STARTUP_MSG:   host = pipeline09.x.y.z/10.2.20.109
>> STARTUP_MSG:   args = []
>> STARTUP_MSG:   version = 0.20.205.0
>> STARTUP_MSG:   build >>
>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-205
>> -r 1179940; compiled by 'hortonfo' on Fri Oct  7 06:20:32 UTC 2011
>> ************************************************************/
>> 2012-07-02 11:00:38,891 INFO
>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>> hadoop-metrics2.properties
>> 2012-07-02 11:00:38,906 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>> MetricsSystem,sub=Stats registered.
>> 2012-07-02 11:00:38,908 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>> period at 10 second(s).
>> 2012-07-02 11:00:38,908 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics
>> system started
>> 2012-07-02 11:00:39,091 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>> ugi registered.
>> 2012-07-02 11:00:39,094 WARN
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi
>> already exists!
>> ============================>>
>> The one node passed that and reached:
>>
>> 2012-07-02 12:24:58,450 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
>> succeeded for blk_-2932980158620691384_129517
>>
>> Does that mean progress? It's been 1.5 hour since the start. And the
>> file system side is: 74515 files and directories, 28439 blocks >> 102954 total
>>
>> Thanks for  helping.
>> James
>>
>> On Mon, Jul 2, 2012 at 12:07 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>
>>> Juanhui,
>>>
>>> It is merely waiting for the DNs to start, and to report its blocks
>>> in. This does not take long once the DNs are up and running. Do you
>>> see any Live Nodes yet?
>>>
>>> On Tue, Jul 3, 2012 at 12:34 AM, Jianhui Zhang <[EMAIL PROTECTED]>
>>> wrote:
>>>>
>>>> Hi folks,
>>>>
>>>> Thanks for helping, especially at such earlier hours.
>>>>
>>>> After leaving it overnight, during which period nothing happened in
>>>> the log, I restarted this morning. This time, it passed the previously
>>>> stuck point, and reached all the way to "IPC Server handler..
+
Harsh J 2012-07-03, 02:21
+
Mohammad Tariq 2012-07-02, 09:32
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB