Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Datanode won't start with bad disk


Copy link to this message
-
Re: Datanode won't start with bad disk
bcc: [EMAIL PROTECTED]
+ [EMAIL PROTECTED]

Hey Adam,

Thanks a lot for the bug report. I've added cdh-user@ to this email, which
may be a more appropriate list for this question.

Best,
Aaron

--
Aaron T. Myers
Software Engineer, Cloudera

On Thu, Mar 24, 2011 at 10:47 AM, Adam Phelps <[EMAIL PROTECTED]> wrote:

> For reference, this is running hadoop 0.20.2 from the CDH3B4 distribution.
>
> - Adam
>
>
> On 3/24/11 10:30 AM, Adam Phelps wrote:
>
>> We have a bad disk on one of our datanode machines, and while we have
>> dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any
>> problem while the DataNode process was running we are seeing a problem
>> when we needed to restart the DataNode process:
>>
>> 2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker:
>> Incorrect permissions were set on /var/lib/stats/hdfs/4, expected:
>> rwxr-xr-x, while actual: ---------. Fixing...
>> 2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader:
>> Loaded the native-hadoop library
>> 2011-03-24 16:50:20,091 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not
>> permitted
>>
>> In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk.
>> It gets that permission error because we have the mount directory set to
>> be immutable:
>>
>> root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/
>> ------------------- /var/lib/stats/hdfs/2
>> ----i------------e- /var/lib/stats/hdfs/4
>> ------------------- /var/lib/stats/hdfs/3
>> ------------------- /var/lib/stats/hdfs/1
>>
>> As we'd previously seen HDFS just write to the local disk when a disk
>> couldn't be mounted.
>>
>> HDFS is supposed to be able to handle failed disk, but it doesn't seem
>> to be doing the right thing in this case. Is this a known problem, or is
>> there some other way we should be configuring things to allow the
>> DataNode to come up in this situation?
>>
>> (clearly we can remove the mount point from hdfs-site.xml, but that
>> doesn't feel like the correct solution)
>>
>> Thanks
>> - Adam
>>
>>
>