|
|
-
Balancing a cluster when a new node is added
Saptarshi Guha 2010-01-09, 17:44
Hello, I'm using Hadoop 0.20.1. I just added a new node to a 5 node cluster(for a total of 6), there is already about 500GB across 5 nodes. In order to distributed the data across the entire cluster (including the new node) I ran
hadoop balancer Time Stamp Iteration# ��Bytes Already Moved Bytes Left To Move Bytes Being Moved The cluster is balanced. Exiting... Balancing took 356.0 milliseconds
Clearly the cluster is not balanced, but how do I force it to be so?
Q2. On the DFS UI website, when I click on the existing nodes to see data, I can, but when I click on the new node, i can't connect. Does this happen when there are no files? The datanode log for this machine does not show any errors. I have managed to copy a small file this new machine (from the new machine, so the file is stored on this machines section of the DFS) 2010-01-09 12:20:57,681 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075 2010-01-09 12:20:57,681 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075 2010-01-09 12:20:57,681 INFO org.mortbay.log: jetty-6.1.14 2010-01-09 12:21:02,148 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50075 2010-01-09 12:21:02,152 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=DataNode, sessionId=null 2010-01-09 12:21:02,165 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=DataNode, port=50020 2010-01-09 12:21:02,167 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2010-01-09 12:21:02,168 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: starting 2010-01-09 12:21:02,168 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: starting 2010-01-09 12:21:02,168 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration DatanodeRegistration(altair.stat.purdue.edu:50010, storageID=, infoPort=50075, ipcPort=50020) 2010-01-09 12:21:02,169 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting 2010-01-09 12:21:02,170 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: starting 2010-01-09 12:21:02,173 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: New storage id DS-1908504044-127.0.0.1-50010-1263057662169 is assigned to data-node 128.210.141.105:50010 2010-01-09 12:21:02,173 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(X.X.X.X:50010, storageID=DS-1908504044-127.0.0.1-50010-1263057662169, infoPort=50075, ipcPort=50020)In DataNode.run, data FSDataset{dirpath='/ln/meraki/hdfs/dfs/data/current'} 2010-01-09 12:21:02,173 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec 2010-01-09 12:21:02,187 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 2 msecs 2010-01-09 12:21:02,188 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic block scanner.
-
Re: Balancing a cluster when a new node is added
Eli Collins 2010-01-10, 09:11
Have you verified this new DNs Hadoop configuration files are the same as the others? Do you see any errors in the NN when restarting HDFS on this new node?
Thanks, Eli
On Sat, Jan 9, 2010 at 9:44 AM, Saptarshi Guha <[EMAIL PROTECTED]> wrote: > Hello, > I'm using Hadoop 0.20.1. I just added a new node to a 5 node > cluster(for a total of 6), there is already about 500GB across 5 > nodes. > In order to distributed the data across the entire cluster (including > the new node) I ran > > hadoop balancer > Time Stamp Iteration# Bytes Already Moved Bytes Left > To Move Bytes Being Moved > The cluster is balanced. Exiting... > Balancing took 356.0 milliseconds > > Clearly the cluster is not balanced, but how do I force it to be so? > > Q2. On the DFS UI website, when I click on the existing nodes to see > data, I can, but when I click on the new node, i can't connect. > Does this happen when there are no files? The datanode log for this > machine does not show any errors. I have managed to copy a small file > this new machine (from the new machine, so the file is stored on this > machines section of the DFS) > > > 2010-01-09 12:20:57,681 INFO org.apache.hadoop.http.HttpServer: > listener.getLocalPort() returned 50075 > webServer.getConnectors()[0].getLocalPort() returned 50075 > 2010-01-09 12:20:57,681 INFO org.apache.hadoop.http.HttpServer: Jetty > bound to port 50075 > 2010-01-09 12:20:57,681 INFO org.mortbay.log: jetty-6.1.14 > 2010-01-09 12:21:02,148 INFO org.mortbay.log: Started > SelectChannelConnector@0.0.0.0:50075 > 2010-01-09 12:21:02,152 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > Initializing JVM Metrics with processName=DataNode, sessionId=null > 2010-01-09 12:21:02,165 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: > Initializing RPC Metrics with hostName=DataNode, port=50020 > 2010-01-09 12:21:02,167 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2010-01-09 12:21:02,168 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 0 on 50020: starting > 2010-01-09 12:21:02,168 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 1 on 50020: starting > 2010-01-09 12:21:02,168 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration > DatanodeRegistration(altair.stat.purdue.edu:50010, storageID=, > infoPort=50075, ipcPort=50020) > 2010-01-09 12:21:02,169 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 50020: starting > 2010-01-09 12:21:02,170 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 2 on 50020: starting > 2010-01-09 12:21:02,173 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: New storage id > DS-1908504044-127.0.0.1-50010-1263057662169 is assigned to data-node > 128.210.141.105:50010 > 2010-01-09 12:21:02,173 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(X.X.X.X:50010, > storageID=DS-1908504044-127.0.0.1-50010-1263057662169, infoPort=50075, > ipcPort=50020)In DataNode.run, data > FSDataset{dirpath='/ln/meraki/hdfs/dfs/data/current'} > 2010-01-09 12:21:02,173 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: using > BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec > 2010-01-09 12:21:02,187 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 > blocks got processed in 2 msecs > 2010-01-09 12:21:02,188 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic > block scanner. >
-
Re: Balancing a cluster when a new node is added
Saptarshi Guha 2010-01-10, 17:36
Hi, Yes, the config files are the same. I checked the namenode log for eac of the 5 pre-existing nodes I get something like
2010-01-10 12:32:33,921 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from X.Y.Z.D:50010 storage DS-1908504044-127.0.0.1-50010-1263057662169
but not for the newly added node. I just added the machine to the slaves file and restarted the cluster. Is there something else I should do to the new node?
Regards Saptarshi
On Sun, Jan 10, 2010 at 4:11 AM, Eli Collins <[EMAIL PROTECTED]> wrote: > Have you verified this new DNs Hadoop configuration files are the same > as the others? Do you see any errors in the NN when restarting HDFS on > this new node? > > Thanks, > Eli > > On Sat, Jan 9, 2010 at 9:44 AM, Saptarshi Guha <[EMAIL PROTECTED]> wrote: >> Hello, >> I'm using Hadoop 0.20.1. I just added a new node to a 5 node >> cluster(for a total of 6), there is already about 500GB across 5 >> nodes. >> In order to distributed the data across the entire cluster (including >> the new node) I ran >> >> hadoop balancer >> Time Stamp Iteration# Bytes Already Moved Bytes Left >> To Move Bytes Being Moved >> The cluster is balanced. Exiting... >> Balancing took 356.0 milliseconds >> >> Clearly the cluster is not balanced, but how do I force it to be so? >> >> Q2. On the DFS UI website, when I click on the existing nodes to see >> data, I can, but when I click on the new node, i can't connect. >> Does this happen when there are no files? The datanode log for this >> machine does not show any errors. I have managed to copy a small file >> this new machine (from the new machine, so the file is stored on this >> machines section of the DFS) >> >> >> 2010-01-09 12:20:57,681 INFO org.apache.hadoop.http.HttpServer: >> listener.getLocalPort() returned 50075 >> webServer.getConnectors()[0].getLocalPort() returned 50075 >> 2010-01-09 12:20:57,681 INFO org.apache.hadoop.http.HttpServer: Jetty >> bound to port 50075 >> 2010-01-09 12:20:57,681 INFO org.mortbay.log: jetty-6.1.14 >> 2010-01-09 12:21:02,148 INFO org.mortbay.log: Started >> SelectChannelConnector@0.0.0.0:50075 >> 2010-01-09 12:21:02,152 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: >> Initializing JVM Metrics with processName=DataNode, sessionId=null >> 2010-01-09 12:21:02,165 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: >> Initializing RPC Metrics with hostName=DataNode, port=50020 >> 2010-01-09 12:21:02,167 INFO org.apache.hadoop.ipc.Server: IPC Server >> Responder: starting >> 2010-01-09 12:21:02,168 INFO org.apache.hadoop.ipc.Server: IPC Server >> handler 0 on 50020: starting >> 2010-01-09 12:21:02,168 INFO org.apache.hadoop.ipc.Server: IPC Server >> handler 1 on 50020: starting >> 2010-01-09 12:21:02,168 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration >> DatanodeRegistration(altair.stat.purdue.edu:50010, storageID=, >> infoPort=50075, ipcPort=50020) >> 2010-01-09 12:21:02,169 INFO org.apache.hadoop.ipc.Server: IPC Server >> listener on 50020: starting >> 2010-01-09 12:21:02,170 INFO org.apache.hadoop.ipc.Server: IPC Server >> handler 2 on 50020: starting >> 2010-01-09 12:21:02,173 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: New storage id >> DS-1908504044-127.0.0.1-50010-1263057662169 is assigned to data-node >> 128.210.141.105:50010 >> 2010-01-09 12:21:02,173 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(X.X.X.X:50010, >> storageID=DS-1908504044-127.0.0.1-50010-1263057662169, infoPort=50075, >> ipcPort=50020)In DataNode.run, data >> FSDataset{dirpath='/ln/meraki/hdfs/dfs/data/current'} >> 2010-01-09 12:21:02,173 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: using >> BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec >> 2010-01-09 12:21:02,187 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 >> blocks got processed in 2 msecs >> 2010-01-09 12:21:02,188 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic
-
Re: Balancing a cluster when a new node is added
Allen Wittenauer 2010-01-11, 21:27
On 1/9/10 9:44 AM, "Saptarshi Guha" <[EMAIL PROTECTED]> wrote: > hadoop balancer > Time Stamp Iteration# Bytes Already Moved Bytes Left > To Move Bytes Being Moved > The cluster is balanced. Exiting... > Balancing took 356.0 milliseconds > > Clearly the cluster is not balanced, but how do I force it to be so?
What happens if you give the balancer command a threshold?
> Q2. On the DFS UI website, when I click on the existing nodes to see > data, I can, but when I click on the new node, i can't connect. > Does this happen when there are no files? The datanode log for this > machine does not show any errors. I have managed to copy a small file > this new machine (from the new machine, so the file is stored on this > machines section of the DFS) Does the namenode actually recognize the new node? What does dfsadmin -report tell you? Are you using a dfs.hosts (aka include) file? Is it listed? Are you using a dfs.hosts.exclude file? Is it listed there on accident?
-
Re: Balancing a cluster when a new node is added
Saptarshi Guha 2010-01-12, 00:46
> What happens if you give the balancer command a threshold? > > So I gave a threshold, (0.20) and it started to run and I got several errors like this
10/01/11 19:43:56 WARN balancer.Balancer: Error moving block 795170313073485718 from spica:50010 to altair:50010 through 128.210.141.89:50010: No route to host
(altair is the node i added). I don't know why there isn't a route to the host, since I can start the node automatically(via ssh), as seen below the report shows it to be there. Is a no route to host possible if the that particular 50010 port is closed?
> >> Q2. On the DFS UI website, when I click on the existing nodes to see >> data, I can, but when I click on the new node, i can't connect. >> Does this happen when there are no files? The datanode log for this >> machine does not show any errors. I have managed to copy a small file >> this new machine (from the new machine, so the file is stored on this >> machines section of the DFS) > > > Does the namenode actually recognize the new node? What does dfsadmin > -report tell you?
The report shows it to be present, Name: A.B.C.D:50010 Decommission Status : Normal Configured Capacity: 1056894091264 (984.31 GB) DFS Used: 524288 (512 KB) Non DFS Used: 55336439808 (51.54 GB) DFS Remaining: 1001557127168(932.77 GB) DFS Used%: 0% DFS Remaining%: 94.76% Last contact: Mon Jan 11 19:40:35 EST 2010
> Are you using a dfs.hosts (aka include) file? Is it > listed? Are you using a dfs.hosts.exclude file? Is it listed there on > accident? > No dfs.hosts, nor excludes. I stopped the cluster (stop-dfs.sh) added the machine(called altair) to the cluster(in the slaves file) and bought it back up. > >
-
Re: Balancing a cluster when a new node is added
Saptarshi Guha 2010-01-12, 03:08
I think that port itself is blocked. I'll contact the sysadmins. Thanks
On Mon, Jan 11, 2010 at 7:46 PM, Saptarshi Guha <[EMAIL PROTECTED]> wrote: >> What happens if you give the balancer command a threshold? >> >> > So I gave a threshold, (0.20) and it started to run and I got several > errors like this > > 10/01/11 19:43:56 WARN balancer.Balancer: Error moving block > 795170313073485718 from spica:50010 to altair:50010 through > 128.210.141.89:50010: No route to host > > (altair is the node i added). > I don't know why there isn't a route to the host, since I can start > the node automatically(via ssh), as seen below the report shows it to > be there. Is a no route to host possible if the that particular 50010 > port is closed? > >> >>> Q2. On the DFS UI website, when I click on the existing nodes to see >>> data, I can, but when I click on the new node, i can't connect. >>> Does this happen when there are no files? The datanode log for this >>> machine does not show any errors. I have managed to copy a small file >>> this new machine (from the new machine, so the file is stored on this >>> machines section of the DFS) >> >> >> Does the namenode actually recognize the new node? What does dfsadmin >> -report tell you? > > The report shows it to be present, > Name: A.B.C.D:50010 > Decommission Status : Normal > Configured Capacity: 1056894091264 (984.31 GB) > DFS Used: 524288 (512 KB) > Non DFS Used: 55336439808 (51.54 GB) > DFS Remaining: 1001557127168(932.77 GB) > DFS Used%: 0% > DFS Remaining%: 94.76% > Last contact: Mon Jan 11 19:40:35 EST 2010 > >> Are you using a dfs.hosts (aka include) file? Is it >> listed? Are you using a dfs.hosts.exclude file? Is it listed there on >> accident? >> > No dfs.hosts, nor excludes. I stopped the cluster (stop-dfs.sh) added > the machine(called altair) to the cluster(in the slaves file) and > bought it back up. > > >> >> >
|
|