Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: Application errors with one disk on datanode getting filled up to 100%


+
Rahul Bhattacharjee 2013-06-12, 07:28
Copy link to this message
-
Re: Application errors with one disk on datanode getting filled up to 100%
So we did a manual rebalance (followed instructions at:
http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F)
and also reserved 30 GB of space for non dfs usage via
dfs.datanode.du.reserved and restarted our apps.

Things have been going fine till now.

Keeping fingers crossed :)
On Wed, Jun 12, 2013 at 12:58 PM, Rahul Bhattacharjee <
[EMAIL PROTECTED]> wrote:

> I have a few points to make , these may not be very helpful for the said
> problem.
>
> +All data nodes are bad exception is kind of not pointing to the problem
> related to disk space full.
> +hadoop.tmp.dir acts as base location of other hadoop related properties ,
> not sure if any particular directory is created specifically.
> +Only one disk getting filled looks strange.The other disk are part while
> formatting the NN.
>
> Would be interesting to know the reason for this.
> Please keep posted.
>
> Thanks,
> Rahul
>
>
> On Mon, Jun 10, 2013 at 3:39 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>
>> From the snapshot, you got around 3TB for writing data.
>>
>> Can you check individual datanode's storage health.
>> As you said you got 80 servers writing parallely to hdfs, I am not sure
>> can that be an issue.
>> As suggested in past threads, you can do a rebalance of the blocks but
>> that will take some time to finish and will not solve your issue right
>> away.
>>
>> You can wait for others to reply. I am sure there will be far better
>> solutions from experts for this.
>>
>>
>> On Mon, Jun 10, 2013 at 3:18 PM, Mayank <[EMAIL PROTECTED]> wrote:
>>
>>> No it's not a map-reduce job. We've a java app running on around 80
>>> machines which writes to hdfs. The error that I'd mentioned is being thrown
>>> by the application and yes we've replication factor set to 3 and following
>>> is status of hdfs:
>>>
>>> Configured Capacity : 16.15 TB DFS Used : 11.84 TB Non DFS Used :872.66 GB DFS
>>> Remaining : 3.46 TB DFS Used% : 73.3 % DFS Remaining% : 21.42 % Live
>>> Nodes<http://hmaster.production.indix.tv:50070/dfsnodelist.jsp?whatNodes=LIVE> :10 Dead
>>> Nodes<http://hmaster.production.indix.tv:50070/dfsnodelist.jsp?whatNodes=DEAD>
>>> : 0  Decommissioning Nodes<http://hmaster.production.indix.tv:50070/dfsnodelist.jsp?whatNodes=DECOMMISSIONING>
>>> : 0 Number of Under-Replicated Blocks : 0
>>>
>>>
>>> On Mon, Jun 10, 2013 at 3:11 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>>
>>>> when you say application errors out .. does that mean your mapreduce
>>>> job is erroring? In that case apart from hdfs space you will need to look
>>>> at mapred tmp directory space as well.
>>>>
>>>> you got 400GB * 4 * 10 = 16TB of disk and lets assume that you have a
>>>> replication factor of 3 so at max you will have datasize of 5TB with you.
>>>> I am also assuming you are not scheduling your program to run on entire
>>>> 5TB with just 10 nodes.
>>>>
>>>> i suspect your clusters mapred tmp space is getting filled in while the
>>>> job is running.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 10, 2013 at 3:06 PM, Mayank <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> We are running a hadoop cluster with 10 datanodes and a namenode. Each
>>>>> datanode is setup with 4 disks (/data1, /data2, /data3, /data4), which each
>>>>> disk having a capacity 414GB.
>>>>>
>>>>>
>>>>> hdfs-site.xml has following property set:
>>>>>
>>>>> <property>
>>>>>         <name>dfs.data.dir</name>
>>>>>
>>>>> <value>/data1/hadoopfs,/data2/hadoopfs,/data3/hadoopfs,/data4/hadoopfs</value>
>>>>>         <description>Data dirs for DFS.</description>
>>>>> </property>
>>>>>
>>>>> Now we are facing a issue where in we find /data1 getting filled up
>>>>> quickly and many a times we see it's usage running at 100% with just few
>>>>> megabytes of free space. This issue is visible on 7 out of 10 datanodes at
>>>>> present.
>>>>>
>>>>> We've some java applications which are writing to hdfs and many a
>>>>> times we are seeing foloowing errors in our application logs:
Mayank Joshi

Skype: mail2mayank
Mb.:  +91 8690625808

Blog: http://www.techynfreesouls.co.nr
PhotoStream: http://picasaweb.google.com/mail2mayank

Today is tommorrow I was so worried about yesterday ...
+
Mayank 2013-06-14, 11:09
+
Sandeep L 2013-06-14, 11:15
+
Sandeep L 2013-06-14, 12:42
+
Rahul Bhattacharjee 2013-06-14, 13:12
+
Rahul Bhattacharjee 2013-06-14, 12:36
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB