bcc: [EMAIL PROTECTED]
+ [EMAIL PROTECTED]
Thanks a lot for the bug report. I've added cdh-user@ to this email, which
may be a more appropriate list for this question.
Aaron T. Myers
Software Engineer, Cloudera
On Thu, Mar 24, 2011 at 10:47 AM, Adam Phelps <[EMAIL PROTECTED]> wrote:
> For reference, this is running hadoop 0.20.2 from the CDH3B4 distribution.
> - Adam
> On 3/24/11 10:30 AM, Adam Phelps wrote:
>> We have a bad disk on one of our datanode machines, and while we have
>> dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any
>> problem while the DataNode process was running we are seeing a problem
>> when we needed to restart the DataNode process:
>> 2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker:
>> Incorrect permissions were set on /var/lib/stats/hdfs/4, expected:
>> rwxr-xr-x, while actual: ---------. Fixing...
>> 2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader:
>> Loaded the native-hadoop library
>> 2011-03-24 16:50:20,091 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not
>> In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk.
>> It gets that permission error because we have the mount directory set to
>> be immutable:
>> root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/
>> ------------------- /var/lib/stats/hdfs/2
>> ----i------------e- /var/lib/stats/hdfs/4
>> ------------------- /var/lib/stats/hdfs/3
>> ------------------- /var/lib/stats/hdfs/1
>> As we'd previously seen HDFS just write to the local disk when a disk
>> couldn't be mounted.
>> HDFS is supposed to be able to handle failed disk, but it doesn't seem
>> to be doing the right thing in this case. Is this a known problem, or is
>> there some other way we should be configuring things to allow the
>> DataNode to come up in this situation?
>> (clearly we can remove the mount point from hdfs-site.xml, but that
>> doesn't feel like the correct solution)
>> - Adam