Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> adding new datanode into cluster needs restarting the namenode?


Copy link to this message
-
Re: adding new datanode into cluster needs restarting the namenode?
Hi,

Topologies are unfortunately cached in Hadoop today and not refreshed
until restart.

So if you add a DN before properly configuring its topology, its
improper default will stick and you'll need to restart the NN to make
it lose that cache. Hence, your second process of configuring first,
starting last, makes more sense.

Perhaps you may also want to configure a better default, that suits
your regular topology levels, such that even if you make a mistake,
the node still starts up.

On Thu, Nov 22, 2012 at 7:07 AM, Maoke <[EMAIL PROTECTED]> wrote:
> hi all,
>
> is there anyone having experience with adding a new datanode into a
> rack-aware cluster without restarting the namenode, in cdh4 distribution? as
> it is said that adding a new datanode is a hot operation that can be done
> when the cluster is online.
>
> i tried that but it looked not working until i restarted the namenode. what
> i did is:
>
> (the cluster has had 4 data nodes and i am adding the 5th)
> 1. add the new node (qa-str-ms02.p-qa) into /etc/hadoop/conf/hosts.include,
> and into /etc/hadoop/conf/slaves
> 2. add the rack entries for qa-str-ms02.p-qa (192.168.159.52) into
> /etc/hadoop/topology.data that topology.sh, the topology script, is
> checking, confirming that ./topology.sh qa-str-ms02.p-qa works well. the
> rack entry looks like:
>
>    qa-str-ms02.p-qa                 /dc1/switch1/rack1/node5
>    192.168.159.52                  /dc1/switch1/rack1/node5
>
> 3. on the namenode: sudo -u hdfs hdfs dfsadmin -refreshNodes
> 4. on the new datanode: sudo /etc/init.d/hadoop-hdfs-datanode start
>
> however, the datanode failed to handshake with the namenode and it soon
> exited. the namenode log said:
>
> 2012-11-21 18:06:11,946 INFO org.apache.hadoop.net.NetworkTopology: Removing
> a n
>
> ode: /default-rack/192.168.159.52:50010
> 2012-11-21 18:06:11,946 INFO org.apache.hadoop.net.NetworkTopology: Adding a
> new node: /default-rack/192.168.159.52:50010
>
> 2012-11-21 18:06:11,946 ERROR org.apache.hadoop.net.NetworkTopology: Error:
> can't add leaf node at depth 2 to topology:
> Number of racks: 3
> Expected number of leaves:3
> /dc1/switch1/rack1/node1/192.168.159.101:50010
>
> /dc1/switch1/rack1/node2/192.168.159.102:50010
> /dc1/switch1/rack1/node3/192.168.159.103:50010
>
> 2012-11-21 18:06:11,946 WARN org.apache.hadoop.ipc.Server: IPC Server
> handler 4 on 8020, call
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.registerDatanode
> from 192.168.159.52:53968: error:
> org.apache.hadoop.net.NetworkTopology$InvalidTopologyException: Invalid
> network topology. You cannot have a rack and a non-rack node at the same
> level of the network topology.
>
> org.apache.hadoop.net.NetworkTopology$InvalidTopologyException: Invalid
> network topology. You cannot have a rack and a non-rack node at the same
> level of the network topology.
>         at
> org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:365)
>
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:619)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3358)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:854)
>         at
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:91)
>
>         at
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:20018)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>         at java.security.AccessController.doPrivileged(Native Method)

Harsh J