On Jul 21, 2010, at 12:45 PM, Travis Crawford wrote:
> Does anyone else run into machines with overfull disks?
It was a common problem when I was at Yahoo!. As the drives get more full, the NN starts getting slower and slower, since it is going to have problems with block placement.
> Any tips on how to avoid getting into this situation?
What we started to do was two-fold:
a) During every maintenance, we'd blow away the mapred temp dirs. The TaskTracker does a very bad job of cleaning up after jobs and there is usually a lot of cruft. If you have a 'flat' disk/fs structure such that MR temp and HDFS is shared, this is a huge problem.
b) Blowing away /tmp on a regular basis. Here at LI, I've got a perl script that I wrote that reads the output of ls /tmp, finds files/dirs older than 3 days, and removes them. Since pig is a little piggy and leaves a ton of useless data in /tmp, I often see 15TB or more disappear just by doing this.
> /dev/cciss/c0d0 275G 217G 45G 83% /data/disk000
The bigger problem is that Hadoop just really doesn't work well with such small filesystems. You might want to check your fs reserved size. You might be able to squeak out a bit more space that way too.
> /dev/cciss/c0d14 275G 248G 14G 95% /data/disk014
I'd probably shutdown this data node and manually move blocks off of this drive onto ...
> /dev/cciss/c1d1p1 275G 184G 78G 71% /data/disk025
> /dev/cciss/c1d2p1 275G 176G 86G 68% /data/disk026
> /dev/cciss/c1d3p1 275G 178G 84G 68% /data/disk027
> /dev/cciss/c1d4p1 275G 177G 85G 68% /data/disk028
> /dev/cciss/c1d5p1 275G 179G 83G 69% /data/disk029
> /dev/cciss/c1d6p1 275G 181G 81G 70% /data/disk030
... one of these.