Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Re: Application errors with one disk on datanode getting filled up to 100%


Copy link to this message
-
Re: Application errors with one disk on datanode getting filled up to 100%
Mayank 2013-06-13, 06:17
So we did a manual rebalance (followed instructions at:
http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F)
and also reserved 30 GB of space for non dfs usage via
dfs.datanode.du.reserved and restarted our apps.

Things have been going fine till now.

Keeping fingers crossed :)
On Wed, Jun 12, 2013 at 12:58 PM, Rahul Bhattacharjee <
[EMAIL PROTECTED]> wrote:

> I have a few points to make , these may not be very helpful for the said
> problem.
>
> +All data nodes are bad exception is kind of not pointing to the problem
> related to disk space full.
> +hadoop.tmp.dir acts as base location of other hadoop related properties ,
> not sure if any particular directory is created specifically.
> +Only one disk getting filled looks strange.The other disk are part while
> formatting the NN.
>
> Would be interesting to know the reason for this.
> Please keep posted.
>
> Thanks,
> Rahul
>
>
> On Mon, Jun 10, 2013 at 3:39 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>
>> From the snapshot, you got around 3TB for writing data.
>>
>> Can you check individual datanode's storage health.
>> As you said you got 80 servers writing parallely to hdfs, I am not sure
>> can that be an issue.
>> As suggested in past threads, you can do a rebalance of the blocks but
>> that will take some time to finish and will not solve your issue right
>> away.
>>
>> You can wait for others to reply. I am sure there will be far better
>> solutions from experts for this.
>>
>>
>> On Mon, Jun 10, 2013 at 3:18 PM, Mayank <[EMAIL PROTECTED]> wrote:
>>
>>> No it's not a map-reduce job. We've a java app running on around 80
>>> machines which writes to hdfs. The error that I'd mentioned is being thrown
>>> by the application and yes we've replication factor set to 3 and following
>>> is status of hdfs:
>>>
>>> Configured Capacity : 16.15 TB DFS Used : 11.84 TB Non DFS Used :872.66 GB DFS
>>> Remaining : 3.46 TB DFS Used% : 73.3 % DFS Remaining% : 21.42 % Live
>>> Nodes<http://hmaster.production.indix.tv:50070/dfsnodelist.jsp?whatNodes=LIVE> :10 Dead
>>> Nodes<http://hmaster.production.indix.tv:50070/dfsnodelist.jsp?whatNodes=DEAD>
>>> : 0  Decommissioning Nodes<http://hmaster.production.indix.tv:50070/dfsnodelist.jsp?whatNodes=DECOMMISSIONING>
>>> : 0 Number of Under-Replicated Blocks : 0
>>>
>>>
>>> On Mon, Jun 10, 2013 at 3:11 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>>
>>>> when you say application errors out .. does that mean your mapreduce
>>>> job is erroring? In that case apart from hdfs space you will need to look
>>>> at mapred tmp directory space as well.
>>>>
>>>> you got 400GB * 4 * 10 = 16TB of disk and lets assume that you have a
>>>> replication factor of 3 so at max you will have datasize of 5TB with you.
>>>> I am also assuming you are not scheduling your program to run on entire
>>>> 5TB with just 10 nodes.
>>>>
>>>> i suspect your clusters mapred tmp space is getting filled in while the
>>>> job is running.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 10, 2013 at 3:06 PM, Mayank <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> We are running a hadoop cluster with 10 datanodes and a namenode. Each
>>>>> datanode is setup with 4 disks (/data1, /data2, /data3, /data4), which each
>>>>> disk having a capacity 414GB.
>>>>>
>>>>>
>>>>> hdfs-site.xml has following property set:
>>>>>
>>>>> <property>
>>>>>         <name>dfs.data.dir</name>
>>>>>
>>>>> <value>/data1/hadoopfs,/data2/hadoopfs,/data3/hadoopfs,/data4/hadoopfs</value>
>>>>>         <description>Data dirs for DFS.</description>
>>>>> </property>
>>>>>
>>>>> Now we are facing a issue where in we find /data1 getting filled up
>>>>> quickly and many a times we see it's usage running at 100% with just few
>>>>> megabytes of free space. This issue is visible on 7 out of 10 datanodes at
>>>>> present.
>>>>>
>>>>> We've some java applications which are writing to hdfs and many a
>>>>> times we are seeing foloowing errors in our application logs:
Mayank Joshi

Skype: mail2mayank
Mb.:  +91 8690625808

Blog: http://www.techynfreesouls.co.nr
PhotoStream: http://picasaweb.google.com/mail2mayank

Today is tommorrow I was so worried about yesterday ...