|
|
-
Re: Problem in copyFromLocalJeff Zhang 2010-09-10, 01:08
check the data node's log to see whether it starts correctly
On Thu, Sep 9, 2010 at 8:51 AM, Medha Atre <[EMAIL PROTECTED]> wrote: > Sorry for the typo in the earlier message: > -------------------------------------------------------- > > Hi, > > I am a new Hadoop user. I followed the tutorial by Michael Noll on > http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29(as > well as for single node) with Hadoop-0.20 and Hadoop-0.21. I keep > facing > one problem intermittently: > > My NameNode, JobTracker, DataNode, and TaskTrackers get started without any > problem and "jps" shows them running to. I can format the DFS space without > any problems. But when I try to use -copyFromLocal command, it fails with > the following exception: > > 2010-09-09 05:54:04,216 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 2 on 54310, call addBlock(/user/hadoop/multinode/advsh12.txt, > DFSClient_2010062748, null, null) from > 9.59.225.190:53125: error: java.io.IOException: File > /user/hadoop/multinode/advsh12.txt could only be replicated to 0 nodes, > instead of 1 > java.io.IOException: File /user/hadoop/multinode/advsh12.txt could only be > replicated to 0 nodes, instead of 1 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1448) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:690) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1350) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1346) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1344) > > Notable thing is: if I let go sufficiently long time between a failure of > the command and its repeat execution, it executes successfully the next > time. > > But if I try to execute the same command without spending much time in > between, it fails with the same exception, (I do shutdown all servers/java > processes, delete the DFS space manually with "rm -rf", and reformat it with > "namenode -format" between repeat executions of the -copyFromLocal command). > > I checked the mailing list archives for this problem. One thread > http://www.mail-archive.com/[EMAIL PROTECTED]/msg00851.htmlsuggested > to check and increase allowed open file descriptors. So I checked > that on my system. > > $ cat /proc/sys/fs/file-max > 1977900 > $ > > This is a pretty large number. > > I checked updated the shell's open file limit too through > /etc/security/limits.conf . Now it looks like - > > $ ulimit -a > <snip> > file size (blocks, -f) unlimited > pending signals (-i) 172032 > max locked memory (kbytes, -l) 32 > max memory size (kbytes, -m) unlimited > open files (-n) *65535* > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) 172032 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > > So I was wondering what might be the root cause of the problem and how I can > fix it (either in Hadoop or in my system)? > > Could someone please help me? > > Thanks. > -- Best Regards Jeff Zhang |