Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Could not get additional block while writing hundreds of files


Copy link to this message
-
Re: Could not get additional block while writing hundreds of files
Hi Manuel,
2013-07-03 15:03:16,427 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
enough replicas, still in need of 3
2013-07-03 15:03:16,427 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:root cause:java.io.IOException: File /log/1372863795616 could only be
replicated to 0 nodes, instead of 1
This indicates you haven't enough space on the HDFS. can you check the
cluster capacity used?
On Thu, Jul 4, 2013 at 12:14 AM, Manuel de Ferran <[EMAIL PROTECTED]
> wrote:

> Greetings all,
>
> we try to import data to an HDFS cluster, but we face random Exception. We
> try to figure out what is the root cause: misconfiguration, too much load,
> ... and how to solve that.
>
> The client writes hundred of files with a replication factor of 3. It
> crashes sometimes at the beginning, sometimes close to the end, and in rare
> case it succeeds.
>
> On failure, we have on client side:
>  DataStreamer Exception: org.apache.hadoop.ipc.RemoteException:
> java.io.IOException: File /log/1372863795616 could only be replicated to 0
> nodes, instead of 1
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
>          ....
>
> which seems to be well known. We have followed the hints from the
> Troubleshooting page, but we're still stuck: lots of disk available on
> datanodes, free inodes, far below the open files limit , all datanodes are
> up and running.
>
> Note that we have other HDFS clients that are still able to write files
> while import is running.
>
> Here is the corresponding extract of the namenode log file:
>
> 2013-07-03 15:03:15,951 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> transactions: 46009 Total time for transactions(ms): 153Number of
> transactions batched in Syncs: 5428 Number of syncs: 32889 SyncTimes(ms):
> 139555
> 2013-07-03 15:03:16,427 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
> enough replicas, still in need of 3
> 2013-07-03 15:03:16,427 ERROR
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:root cause:java.io.IOException: File /log/1372863795616 could only be
> replicated to 0 nodes, instead of 1
> 2013-07-03 15:03:16,427 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 9 on 9002, call addBlock(/log/1372863795616, DFSClient_1875494617,
> null) from 192.168.1.141:41376: error: java.io.IOException: File
> /log/1372863795616 could only be replicated to 0 nodes, instead of 1
> java.io.IOException: File /log/1372863795616 could only be replicated to 0
> nodes, instead of 1
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
>         at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>
>
> During the process, fsck reports about 300 of open files. The cluster is
> running hadoop-1.0.3.
>
> Any advice about the configuration ? We tried to
> lower dfs.heartbeat.interval, we raised dfs.datanode.max.xcievers to 4k
> maybe raising dfs.datanode.handler.count ?
>
>
> Thanks for your help
>