Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> DataNodes fail to send heartbeat to HA-enabled NameNode


+
Takahiko Kawasaki 2012-10-30, 11:10
Copy link to this message
-
Re: DataNodes fail to send heartbeat to HA-enabled NameNode
Moving to [EMAIL PROTECTED]
(https://groups.google.com/a/cloudera.org/forum/#!forum/cdh-user), as
it may be a CDH4 specific problem.

Could you share your whole DN log (from startup until heartbeat
errors) please? I suspect its a problem with DN registration, that the
log will help confirm.

On Tue, Oct 30, 2012 at 4:40 PM, Takahiko Kawasaki <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I have trouble in quorum-based HDFS HA of CDH 4.1.1.
>
> NameNode Web UI of Cloudera Manager reports NameNode status.
> Its has "Cluster Summary" section and my cluster is summarized
> there like below.
>
> --- Cluster Summary ---
> Configured Capacity   : 0 KB
> DFS Used              : 0 KB
> Non DFS Used          : 0 KB
> DFS Remaining         : 0 KB
> DFS Used%             : 100 %
> DFS Remaining%        : 0 %
> Block Pool Used       : 0 KB
> Block Pool Used%      : 100 %
> DataNodes usages      : Min %  Median %  Max %  stdev %
>                           0 %       0 %    0 %      0 %
> Live Nodes            : 0 (Decommissioned: 0)
> Dead Nodes            : 5 (Decommissioned: 0)
> Decommissioning Nodes : 0
> --------------------
>
> As you can see, all the DataNodes are regarded as dead.
>
> I found DataNodes continued to emit logs about failure to
> send heartbeat to NameNode.
>
> ---- DataNode Log (host names were manually edited) ---
> 2012-10-30 19:28:16,817 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode
> node02.example.com/192.168.62.232:8020 using DELETEREPORT_INTERVAL of
> 300000 msec  BLOCKREPORT_INTERVAL of 21600000msec Initial delay:
> 0msec; heartBeatInterval=3000
> 2012-10-30 19:28:16,817 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
> BPOfferService for Block pool
> BP-2063217961-192.168.62.231-1351263110470 (storage id
> DS-2090122187-192.168.62.233-50010-1338981658216) service to
> node02.example.com/192.168.62.232:8020
> java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:435)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:521)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:674)
>         at java.lang.Thread.run(Thread.java:662)
> --------------------
>
> So, I guess that DataNodes are failing to locate the name service
> for some reasons, but I don't have any clue to solve the problem.
>
> I confirmed that
> /var/run/cloudera-scm-agent/process/???-hdfs-DATANODE/core-site.xml
> of a DataNode contains
>
> --- core-site.xml ---
>   <property>
>     <name>fs.defaultFS</name>
>     <value>hdfs://nameservice1</value>
>   </property>
> --------------------
>
> and hdfs-site.xml contains
>
> --- hdfs-site.xml ---
>   <property>
>     <name>dfs.nameservices</name>
>     <value>nameservice1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.nameservice1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.nameservice1</name>
>     <value>namenode38,namenode90</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.nameservice1.namenode38</name>
>     <value>node01.example.com:8020</value>
>   </property>
>   <property>
>     <name>dfs.namenode.http-address.nameservice1.namenode38</name>
>     <value>node01.example.com:50070</value>
>   </property>
>   <property>
>     <name>dfs.namenode.https-address.nameservice1.namenode38</name>
>     <value>node01.example.com:50470</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.nameservice1.namenode90</name>
>     <value>node02.example.com:8020</value>
>   </property>
>   <property>
>     <name>dfs.namenode.http-address.nameservice1.namenode90</name>
>     <value>node02.example.com:50070</value>
>   </property>
>   <property>
>     <name>dfs.namenode.https-address.nameservice1.namenode90</name>

Harsh J
+
Steve Loughran 2012-10-30, 17:06
+
Todd Lipcon 2012-10-30, 18:23
+
Todd Lipcon 2012-10-30, 20:16
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB