Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - DataNodes fail to send heartbeat to HA-enabled NameNode


Copy link to this message
-
Re: DataNodes fail to send heartbeat to HA-enabled NameNode
Harsh J 2012-10-30, 18:14
Moving to [EMAIL PROTECTED]
(https://groups.google.com/a/cloudera.org/forum/#!forum/cdh-user), as
it may be a CDH4 specific problem.

Could you share your whole DN log (from startup until heartbeat
errors) please? I suspect its a problem with DN registration, that the
log will help confirm.

On Tue, Oct 30, 2012 at 4:40 PM, Takahiko Kawasaki <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I have trouble in quorum-based HDFS HA of CDH 4.1.1.
>
> NameNode Web UI of Cloudera Manager reports NameNode status.
> Its has "Cluster Summary" section and my cluster is summarized
> there like below.
>
> --- Cluster Summary ---
> Configured Capacity   : 0 KB
> DFS Used              : 0 KB
> Non DFS Used          : 0 KB
> DFS Remaining         : 0 KB
> DFS Used%             : 100 %
> DFS Remaining%        : 0 %
> Block Pool Used       : 0 KB
> Block Pool Used%      : 100 %
> DataNodes usages      : Min %  Median %  Max %  stdev %
>                           0 %       0 %    0 %      0 %
> Live Nodes            : 0 (Decommissioned: 0)
> Dead Nodes            : 5 (Decommissioned: 0)
> Decommissioning Nodes : 0
> --------------------
>
> As you can see, all the DataNodes are regarded as dead.
>
> I found DataNodes continued to emit logs about failure to
> send heartbeat to NameNode.
>
> ---- DataNode Log (host names were manually edited) ---
> 2012-10-30 19:28:16,817 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode
> node02.example.com/192.168.62.232:8020 using DELETEREPORT_INTERVAL of
> 300000 msec  BLOCKREPORT_INTERVAL of 21600000msec Initial delay:
> 0msec; heartBeatInterval=3000
> 2012-10-30 19:28:16,817 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
> BPOfferService for Block pool
> BP-2063217961-192.168.62.231-1351263110470 (storage id
> DS-2090122187-192.168.62.233-50010-1338981658216) service to
> node02.example.com/192.168.62.232:8020
> java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:435)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:521)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:674)
>         at java.lang.Thread.run(Thread.java:662)
> --------------------
>
> So, I guess that DataNodes are failing to locate the name service
> for some reasons, but I don't have any clue to solve the problem.
>
> I confirmed that
> /var/run/cloudera-scm-agent/process/???-hdfs-DATANODE/core-site.xml
> of a DataNode contains
>
> --- core-site.xml ---
>   <property>
>     <name>fs.defaultFS</name>
>     <value>hdfs://nameservice1</value>
>   </property>
> --------------------
>
> and hdfs-site.xml contains
>
> --- hdfs-site.xml ---
>   <property>
>     <name>dfs.nameservices</name>
>     <value>nameservice1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.nameservice1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.nameservice1</name>
>     <value>namenode38,namenode90</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.nameservice1.namenode38</name>
>     <value>node01.example.com:8020</value>
>   </property>
>   <property>
>     <name>dfs.namenode.http-address.nameservice1.namenode38</name>
>     <value>node01.example.com:50070</value>
>   </property>
>   <property>
>     <name>dfs.namenode.https-address.nameservice1.namenode38</name>
>     <value>node01.example.com:50470</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.nameservice1.namenode90</name>
>     <value>node02.example.com:8020</value>
>   </property>
>   <property>
>     <name>dfs.namenode.http-address.nameservice1.namenode90</name>
>     <value>node02.example.com:50070</value>
>   </property>
>   <property>
>     <name>dfs.namenode.https-address.nameservice1.namenode90</name>

Harsh J