-Re: Application errors with one disk on datanode getting filled up to 100%
Mayank 2013-06-10, 09:48
No it's not a map-reduce job. We've a java app running on around 80
machines which writes to hdfs. The error that I'd mentioned is being thrown
by the application and yes we've replication factor set to 3 and following
is status of hdfs:
Configured Capacity : 16.15 TB DFS Used : 11.84 TB Non DFS Used : 872.66 GB DFS
Remaining : 3.46 TB DFS Used% : 73.3 % DFS Remaining% : 21.42 % Live
: 10 Dead Nodes<http://hmaster.production.indix.tv:50070/dfsnodelist.jsp?whatNodes=DEAD>
: 0 Decommissioning
: 0 Number of Under-Replicated Blocks : 0
On Mon, Jun 10, 2013 at 3:11 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
> when you say application errors out .. does that mean your mapreduce job
> is erroring? In that case apart from hdfs space you will need to look at
> mapred tmp directory space as well.
> you got 400GB * 4 * 10 = 16TB of disk and lets assume that you have a
> replication factor of 3 so at max you will have datasize of 5TB with you.
> I am also assuming you are not scheduling your program to run on entire
> 5TB with just 10 nodes.
> i suspect your clusters mapred tmp space is getting filled in while the
> job is running.
> On Mon, Jun 10, 2013 at 3:06 PM, Mayank <[EMAIL PROTECTED]> wrote:
>> We are running a hadoop cluster with 10 datanodes and a namenode. Each
>> datanode is setup with 4 disks (/data1, /data2, /data3, /data4), which each
>> disk having a capacity 414GB.
>> hdfs-site.xml has following property set:
>> <description>Data dirs for DFS.</description>
>> Now we are facing a issue where in we find /data1 getting filled up
>> quickly and many a times we see it's usage running at 100% with just few
>> megabytes of free space. This issue is visible on 7 out of 10 datanodes at
>> We've some java applications which are writing to hdfs and many a times
>> we are seeing foloowing errors in our application logs:
>> java.io.IOException: All datanodes xxx.xxx.xxx.xxx:50010 are bad. Aborting...
>> at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3093)
>> at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2200(DFSClient.java:2586)
>> at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2790)
>> I went through some old discussions and looks like manual rebalancing is
>> what is required in this case and we should also have
>> dfs.datanode.du.reserved set up.
>> However I'd like to understand if this issue, with one disk getting
>> filled up to 100% can result into the issue which we are seeing in our
>> Also, are there any other peformance implications due to some of the
>> disks running at 100% usage on a datanode.
>> Mayank Joshi
>> Skype: mail2mayank
>> Mb.: +91 8690625808
>> Blog: http://www.techynfreesouls.co.nr
>> PhotoStream: http://picasaweb.google.com/mail2mayank
>> Today is tommorrow I was so worried about yesterday ...
> Nitin Pawar
Mb.: +91 8690625808
Today is tommorrow I was so worried about yesterday ...