Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Namenode hangs on startup


Copy link to this message
-
Re: Namenode hangs on startup
James,

You may be suffering from this old issue that deals with system
entropy? http://search-hadoop.com/m/7Giae6vLWR1

You'll know for sure if you jstack your DNs and run it by the above
thread's comments.

On Tue, Jul 3, 2012 at 4:00 AM, Jianhui Zhang <[EMAIL PROTECTED]> wrote:
> Thanks for helping.
>
> I do have a secondary namenode.
>
> And I'm starting one datanode at a time now. Previously I was using
> start-dfs.sh to start NN and slaves. I observed pretty strange
> behaviors:
>
> 1. If use start-dfs.sh, almost none of the DN joined. All seemed stuck.
>
> 2. If I start one DN at a time, SOME joined instantly, but some looked
> hanging... The funny thing is: if I start over again, the behavior
> changed for each node....
>
> Any idea?
>
> Thanks,
> James
>
>
> On Mon, Jul 2, 2012 at 12:56 PM, David Rosenstrauch <[EMAIL PROTECTED]> wrote:
>> A couple of thoughts:
>>
>> 1) It shouldn't take more than a couple of minutes for the data nodes to
>> re-register after the name node comes back up.  If the data nodes aren't
>> registering, you might try restarting the data node daemons on each machine.
>> They should all register soon after.  (If not, then you have other problems
>> involved.)
>>
>> 2) Re: the original symptom you reported about the namenode "hanging" at
>> startup:  you should read up on the "secondary namenode" component in Hadoop
>> (you might start at
>> http://hadoop.apache.org/common/docs/r0.17.2/hdfs_user_guide.html#Secondary+Namenode
>> and
>> http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/).
>> If you don't have a secondary namenode running in your cluster, that can
>> have a significantly negative impact on namenode startup time.  (The
>> namenode needs to replay the log at startup - which can take a long time if
>> the log has gotten large.)
>>
>> HTH,
>>
>> DR
>>
>>
>> On 07/02/2012 03:32 PM, Jianhui Zhang wrote:
>>>
>>> Hi Harsh,
>>>
>>> The web portal of the NN shows 0 nodes.
>>>
>>> Looking into each node's log, all nodes but one have been staying at:
>>>
>>> /************************************************************
>>> STARTUP_MSG: Starting DataNode
>>> STARTUP_MSG:   host = pipeline09.x.y.z/10.2.20.109
>>> STARTUP_MSG:   args = []
>>> STARTUP_MSG:   version = 0.20.205.0
>>> STARTUP_MSG:   build >>>
>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-205
>>> -r 1179940; compiled by 'hortonfo' on Fri Oct  7 06:20:32 UTC 2011
>>> ************************************************************/
>>> 2012-07-02 11:00:38,891 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>>> hadoop-metrics2.properties
>>> 2012-07-02 11:00:38,906 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>>> MetricsSystem,sub=Stats registered.
>>> 2012-07-02 11:00:38,908 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>>> period at 10 second(s).
>>> 2012-07-02 11:00:38,908 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics
>>> system started
>>> 2012-07-02 11:00:39,091 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>>> ugi registered.
>>> 2012-07-02 11:00:39,094 WARN
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi
>>> already exists!
>>> ============================>>>
>>> The one node passed that and reached:
>>>
>>> 2012-07-02 12:24:58,450 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
>>> succeeded for blk_-2932980158620691384_129517
>>>
>>> Does that mean progress? It's been 1.5 hour since the start. And the
>>> file system side is: 74515 files and directories, 28439 blocks >>> 102954 total
>>>
>>> Thanks for  helping.
>>> James
>>>
>>> On Mon, Jul 2, 2012 at 12:07 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>>
>>>> Juanhui,
>>>>
>>>> It is merely waiting for the DNs to start, and to report its blocks
>>>> in. This does not take long once the DNs are up and running. Do you

Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB