Thanks,
We think the problem is,
We have unbalanced HDFS cluster, some of the data nodes are in more 90%, and some are less than 30% - it happened because the nodes with free space are newer.
We think that when a task tracker is getting a task, it tries to write its map output first to its local data node, and since many of the nodes are full, the task tracker fails.
Does this diagnosis sounds logical?
Are there workarounds?
We are running the blancer, but it takes a lot of time... in this time the cluster not working
We are using the CDH2 of cloudera
Thanks
-----Original Message-----
From: elton sky [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, April 05, 2011 10:18 AM
To: [EMAIL PROTECTED]
Subject: Re: We are looking to the root of the problem that caused us IOException
check the FAQ (
http://wiki.apache.org/hadoop/FAQ#What_does_.22file_could_only_be_replicated_to_0_nodes.2C_instead_of_1.22_mean.3F)
On Tue, Apr 5, 2011 at 4:53 PM, Guy Doulberg <[EMAIL PROTECTED]>wrote:
> Hey guys,
>
> We are trying to figure out why many of our Map/Reduce job on the cluster
> are failing.
> In log we are getting this message I n the failing jobs:
>
>
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File **a
> filename*** could only be replicated to 0 nodes, instead of 1
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1282)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:469)
>
> at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:597)
>
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:396)
>
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)
>
>
>
> at org.apache.hadoop.ipc.Client.call(Client.java:818)
>
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>
> at $Proxy1.addBlock(Unknown Source)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:597)
>
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>
> at $Proxy1.addBlock(Unknown Source)
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2932)
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2807)
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2087)
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2274)
>
>
>
> Where should we look?
> What are the candidates to be the root of this message?
>
> Thanks, Guy
>
>
>
>