Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Balancing a cluster when a new node is added


Copy link to this message
-
Balancing a cluster when a new node is added
Saptarshi Guha 2010-01-09, 17:44
Hello,
I'm using Hadoop 0.20.1. I just added a new node to a 5 node
cluster(for a total of 6), there is already about 500GB across 5
nodes.
In order to distributed the data across the entire cluster (including
the new node) I ran

hadoop balancer
Time Stamp               Iteration# ��Bytes Already Moved  Bytes Left
To Move  Bytes Being Moved
The cluster is balanced. Exiting...
Balancing took 356.0 milliseconds

Clearly the cluster is not balanced, but how do I force it to be so?

Q2. On the DFS UI website, when I click on the existing nodes to see
data, I can, but when I click on the new node, i can't connect.
Does this happen when there are no files? The datanode log for this
machine does not show any errors. I have managed to copy a small file
this new machine (from the new machine, so the file is stored on this
machines section of the DFS)
2010-01-09 12:20:57,681 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 50075
webServer.getConnectors()[0].getLocalPort() returned 50075
2010-01-09 12:20:57,681 INFO org.apache.hadoop.http.HttpServer: Jetty
bound to port 50075
2010-01-09 12:20:57,681 INFO org.mortbay.log: jetty-6.1.14
2010-01-09 12:21:02,148 INFO org.mortbay.log: Started
SelectChannelConnector@0.0.0.0:50075
2010-01-09 12:21:02,152 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=DataNode, sessionId=null
2010-01-09 12:21:02,165 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=DataNode, port=50020
2010-01-09 12:21:02,167 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2010-01-09 12:21:02,168 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 50020: starting
2010-01-09 12:21:02,168 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 50020: starting
2010-01-09 12:21:02,168 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration DatanodeRegistration(altair.stat.purdue.edu:50010, storageID=,
infoPort=50075, ipcPort=50020)
2010-01-09 12:21:02,169 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 50020: starting
2010-01-09 12:21:02,170 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 50020: starting
2010-01-09 12:21:02,173 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: New storage id
DS-1908504044-127.0.0.1-50010-1263057662169 is assigned to data-node
128.210.141.105:50010
2010-01-09 12:21:02,173 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(X.X.X.X:50010,
storageID=DS-1908504044-127.0.0.1-50010-1263057662169, infoPort=50075,
ipcPort=50020)In DataNode.run, data FSDataset{dirpath='/ln/meraki/hdfs/dfs/data/current'}
2010-01-09 12:21:02,173 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: using
BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
2010-01-09 12:21:02,187 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0
blocks got processed in 2 msecs
2010-01-09 12:21:02,188 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic
block scanner.