|
|
-
Rapid growth in Non DFS Used disk space
Kester, Scott 2011-05-13, 17:40
We have an 11 node Hadoop cluster running 20.2 that has been in production for 15 months now. The system is used to process log files that are ingested daily, and the oldest files in the HDFS are deleted to free up space as needed, typically when the free space is less than 10% (the delete is done using 'hadoop fs -rmr' on the parent directory of the files to be deleted). When the HDFS was originally built it had 1TB of 'Non DFS' space out of the 20TB total. This 1TB stayed constant for at least the first year the system has been in use.
However over the last few weeks I have seen the 'Non DFS Used' as reported by the NameNode dfshealth.jsp page grow to 2G and rising. The total number of files/directories and blocks in use has remained fairly constant over this time. I am concerned that the Non DFS Used is going to consume more and more of the HDFS if left unchecked. Running fcsk gave "The filesystem under path '/' is HEALTHY".
Questions:
A) What exactly is hadoop reporting as 'Non DFS Used', and how is it calculated? Are these files on the same partition(s) as the HDFS files, but are not actually part of the HDFS?
2) Any ideas on what is driving the growth in Non DFS Used space? I looked for things like growing log files on the datanodes but didn't find anything.
Thanks, Scott
+
Kester, Scott 2011-05-13, 17:40
-
Re: Rapid growth in Non DFS Used disk space
Todd Lipcon 2011-05-13, 17:48
On Fri, May 13, 2011 at 10:40 AM, Kester, Scott <[EMAIL PROTECTED]> wrote:
> We have an 11 node Hadoop cluster running 20.2 that has been in > production for 15 months now. The system is used to process log files that > are ingested daily, and the oldest files in the HDFS are deleted to free up > space as needed, typically when the free space is less than 10% (the delete > is done using 'hadoop fs -rmr' on the parent directory of the files to be > deleted). When the HDFS was originally built it had 1TB of 'Non DFS' space > out of the 20TB total. This 1TB stayed constant for at least the first year > the system has been in use. > > However over the last few weeks I have seen the 'Non DFS Used' as > reported by the NameNode dfshealth.jsp page grow to 2G and rising. The > total number of files/directories and blocks in use has remained fairly > constant over this time. I am concerned that the Non DFS Used is going to > consume more and more of the HDFS if left unchecked. Running fcsk gave "The > filesystem under path '/' is HEALTHY". > > Questions: > > A) What exactly is hadoop reporting as 'Non DFS Used', and how is it > calculated? Are these files on the same partition(s) as the HDFS files, but > are not actually part of the HDFS? > > Yes - it's usage reported by "df" that isn't coming from HDFS blocks. > 2) Any ideas on what is driving the growth in Non DFS Used space? I > looked for things like growing log files on the datanodes but didn't find > anything. >
Logs are one possible culprit. Another is to look for old files that might be orphaned in your mapred.local.dir - there have been bugs in the past where we've leaked files. If you shut down the TaskTrackers, you can safely delete everything from within mapred.local.dirs.
-Todd -- Todd Lipcon Software Engineer, Cloudera
+
Todd Lipcon 2011-05-13, 17:48
-
Re: Rapid growth in Non DFS Used disk space
Allen Wittenauer 2011-05-13, 19:12
On May 13, 2011, at 10:48 AM, Todd Lipcon wrote: > > >> 2) Any ideas on what is driving the growth in Non DFS Used space? I >> looked for things like growing log files on the datanodes but didn't find >> anything. >> > > Logs are one possible culprit. Another is to look for old files that might > be orphaned in your mapred.local.dir - there have been bugs in the past > where we've leaked files. If you shut down the TaskTrackers, you can safely > delete everything from within mapred.local.dirs.
Part of our S.O.P. during Hadoop bounces is to wipe mapred.local out. The TT doesn't properly clean up after itself.
+
Allen Wittenauer 2011-05-13, 19:12
-
Re: Rapid growth in Non DFS Used disk space
Kester, Scott 2011-05-13, 20:41
We have a job that cleans up the mapred.local directory, so that¹s not it. I have done some further looking at data usage on the datanodes and 99% of the space used is under the dfs.data.dir/current directory. What would be under 'current' that wasn't part of HDFS?
On 5/13/11 3:12 PM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote:
> >On May 13, 2011, at 10:48 AM, Todd Lipcon wrote: >> >> >>> 2) Any ideas on what is driving the growth in Non DFS Used space? I >>> looked for things like growing log files on the datanodes but didn't >>>find >>> anything. >>> >> >> Logs are one possible culprit. Another is to look for old files that >>might >> be orphaned in your mapred.local.dir - there have been bugs in the past >> where we've leaked files. If you shut down the TaskTrackers, you can >>safely >> delete everything from within mapred.local.dirs. > > Part of our S.O.P. during Hadoop bounces is to wipe mapred.local out. >The TT doesn't properly clean up after itself.
+
Kester, Scott 2011-05-13, 20:41
-
Re: Rapid growth in Non DFS Used disk space
suresh srinivas 2011-05-15, 04:20
dfs.data.dir/current is used by datanodes to store blocks. This directory should only have files starting with blk-*
Things to check: - Are there other files that are not blk related? - Did you manually copy the content of one storage dir to another? (some folks did this when they added new disks) On Fri, May 13, 2011 at 1:41 PM, Kester, Scott <[EMAIL PROTECTED]> wrote:
> We have a job that cleans up the mapred.local directory, so that¹s not it. > I have done some further looking at data usage on the datanodes and 99% > of the space used is under the dfs.data.dir/current directory. What would > be under 'current' that wasn't part of HDFS? > > On 5/13/11 3:12 PM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote: > > > > >On May 13, 2011, at 10:48 AM, Todd Lipcon wrote: > >> > >> > >>> 2) Any ideas on what is driving the growth in Non DFS Used space? I > >>> looked for things like growing log files on the datanodes but didn't > >>>find > >>> anything. > >>> > >> > >> Logs are one possible culprit. Another is to look for old files that > >>might > >> be orphaned in your mapred.local.dir - there have been bugs in the past > >> where we've leaked files. If you shut down the TaskTrackers, you can > >>safely > >> delete everything from within mapred.local.dirs. > > > > Part of our S.O.P. during Hadoop bounces is to wipe mapred.local > out. > >The TT doesn't properly clean up after itself. > > -- Regards, Suresh
+
suresh srinivas 2011-05-15, 04:20
-
Re: Rapid growth in Non DFS Used disk space
Kester, Scott 2011-05-16, 15:50
I was able to track this down this morning. The process that ingests the log files into the HDFS cluster is not closing file handles after it deletes temp files created during ingest. That causes df and du to report different values of usage. Re-starting the ingest process cleared the filehandles and the Non DFS space is now back to normal. Thanks for the help guys.
From: suresh srinivas <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Date: Sat, 14 May 2011 21:20:44 -0700 To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Subject: Re: Rapid growth in Non DFS Used disk space
dfs.data.dir/current is used by datanodes to store blocks. This directory should only have files starting with blk-*
Things to check: - Are there other files that are not blk related? - Did you manually copy the content of one storage dir to another? (some folks did this when they added new disks) On Fri, May 13, 2011 at 1:41 PM, Kester, Scott <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: We have a job that cleans up the mapred.local directory, so that¹s not it. I have done some further looking at data usage on the datanodes and 99% of the space used is under the dfs.data.dir/current directory. What would be under 'current' that wasn't part of HDFS?
On 5/13/11 3:12 PM, "Allen Wittenauer" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
> >On May 13, 2011, at 10:48 AM, Todd Lipcon wrote: >> >> >>> 2) Any ideas on what is driving the growth in Non DFS Used space? I >>> looked for things like growing log files on the datanodes but didn't >>>find >>> anything. >>> >> >> Logs are one possible culprit. Another is to look for old files that >>might >> be orphaned in your mapred.local.dir - there have been bugs in the past >> where we've leaked files. If you shut down the TaskTrackers, you can >>safely >> delete everything from within mapred.local.dirs. > > Part of our S.O.P. during Hadoop bounces is to wipe mapred.local out. >The TT doesn't properly clean up after itself. -- Regards, Suresh
+
Kester, Scott 2011-05-16, 15:50
|
|