Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Balancing a cluster when a new node is added


Copy link to this message
-
Balancing a cluster when a new node is added
Hello,
I'm using Hadoop 0.20.1. I just added a new node to a 5 node
cluster(for a total of 6), there is already about 500GB across 5
nodes.
In order to distributed the data across the entire cluster (including
the new node) I ran

hadoop balancer
Time Stamp               Iteration# ��Bytes Already Moved  Bytes Left
To Move  Bytes Being Moved
The cluster is balanced. Exiting...
Balancing took 356.0 milliseconds

Clearly the cluster is not balanced, but how do I force it to be so?

Q2. On the DFS UI website, when I click on the existing nodes to see
data, I can, but when I click on the new node, i can't connect.
Does this happen when there are no files? The datanode log for this
machine does not show any errors. I have managed to copy a small file
this new machine (from the new machine, so the file is stored on this
machines section of the DFS)
2010-01-09 12:20:57,681 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 50075
webServer.getConnectors()[0].getLocalPort() returned 50075
2010-01-09 12:20:57,681 INFO org.apache.hadoop.http.HttpServer: Jetty
bound to port 50075
2010-01-09 12:20:57,681 INFO org.mortbay.log: jetty-6.1.14
2010-01-09 12:21:02,148 INFO org.mortbay.log: Started
SelectChannelConnector@0.0.0.0:50075
2010-01-09 12:21:02,152 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=DataNode, sessionId=null
2010-01-09 12:21:02,165 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=DataNode, port=50020
2010-01-09 12:21:02,167 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2010-01-09 12:21:02,168 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 50020: starting
2010-01-09 12:21:02,168 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 50020: starting
2010-01-09 12:21:02,168 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration DatanodeRegistration(altair.stat.purdue.edu:50010, storageID=,
infoPort=50075, ipcPort=50020)
2010-01-09 12:21:02,169 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 50020: starting
2010-01-09 12:21:02,170 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 50020: starting
2010-01-09 12:21:02,173 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: New storage id
DS-1908504044-127.0.0.1-50010-1263057662169 is assigned to data-node
128.210.141.105:50010
2010-01-09 12:21:02,173 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(X.X.X.X:50010,
storageID=DS-1908504044-127.0.0.1-50010-1263057662169, infoPort=50075,
ipcPort=50020)In DataNode.run, data FSDataset{dirpath='/ln/meraki/hdfs/dfs/data/current'}
2010-01-09 12:21:02,173 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: using
BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
2010-01-09 12:21:02,187 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0
blocks got processed in 2 msecs
2010-01-09 12:21:02,188 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic
block scanner.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB