|
Adam Phelps
2011-03-24, 17:30
Adam Phelps
2011-03-24, 17:47
Bharath Mundlapudi
2011-03-24, 23:00
Bharath Mundlapudi
2011-03-24, 23:08
Adam Phelps
2011-03-25, 00:21
Allen Wittenauer
2011-03-25, 16:43
Aaron T. Myers
2011-03-25, 16:48
|
-
Datanode won't start with bad diskAdam Phelps 2011-03-24, 17:30
We have a bad disk on one of our datanode machines, and while we have
dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any problem while the DataNode process was running we are seeing a problem when we needed to restart the DataNode process: 2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker: Incorrect permissions were set on /var/lib/stats/hdfs/4, expected: rwxr-xr-x, while actual: ---------. Fixing... 2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2011-03-24 16:50:20,091 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not permitted In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk. It gets that permission error because we have the mount directory set to be immutable: root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/ ------------------- /var/lib/stats/hdfs/2 ----i------------e- /var/lib/stats/hdfs/4 ------------------- /var/lib/stats/hdfs/3 ------------------- /var/lib/stats/hdfs/1 As we'd previously seen HDFS just write to the local disk when a disk couldn't be mounted. HDFS is supposed to be able to handle failed disk, but it doesn't seem to be doing the right thing in this case. Is this a known problem, or is there some other way we should be configuring things to allow the DataNode to come up in this situation? (clearly we can remove the mount point from hdfs-site.xml, but that doesn't feel like the correct solution) Thanks - Adam
-
Re: Datanode won't start with bad diskAdam Phelps 2011-03-24, 17:47
For reference, this is running hadoop 0.20.2 from the CDH3B4 distribution.
- Adam On 3/24/11 10:30 AM, Adam Phelps wrote: > We have a bad disk on one of our datanode machines, and while we have > dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any > problem while the DataNode process was running we are seeing a problem > when we needed to restart the DataNode process: > > 2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker: > Incorrect permissions were set on /var/lib/stats/hdfs/4, expected: > rwxr-xr-x, while actual: ---------. Fixing... > 2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader: > Loaded the native-hadoop library > 2011-03-24 16:50:20,091 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not > permitted > > In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk. > It gets that permission error because we have the mount directory set to > be immutable: > > root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/ > ------------------- /var/lib/stats/hdfs/2 > ----i------------e- /var/lib/stats/hdfs/4 > ------------------- /var/lib/stats/hdfs/3 > ------------------- /var/lib/stats/hdfs/1 > > As we'd previously seen HDFS just write to the local disk when a disk > couldn't be mounted. > > HDFS is supposed to be able to handle failed disk, but it doesn't seem > to be doing the right thing in this case. Is this a known problem, or is > there some other way we should be configuring things to allow the > DataNode to come up in this situation? > > (clearly we can remove the mount point from hdfs-site.xml, but that > doesn't feel like the correct solution) > > Thanks > - Adam >
-
Re: Datanode won't start with bad diskBharath Mundlapudi 2011-03-24, 23:00
Hi Adam,
I have posted a patch for this problem for Hadoop version 20. Please refer the following Jira. https://issues.apache.org/jira/browse/HDFS-1592 -Bharath ________________________________ From: Adam Phelps <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Thursday, March 24, 2011 10:30 AM Subject: Re: Datanode won't start with bad disk We have a bad disk on one of our datanode machines, and while we have dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any problem while the DataNode process was running we are seeing a problem when we needed to restart the DataNode process: 2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker: Incorrect permissions were set on /var/lib/stats/hdfs/4, expected: rwxr-xr-x, while actual: ---------. Fixing... 2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2011-03-24 16:50:20,091 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not permitted In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk. It gets that permission error because we have the mount directory set to be immutable: root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/ ------------------- /var/lib/stats/hdfs/2 ----i------------e- /var/lib/stats/hdfs/4 ------------------- /var/lib/stats/hdfs/3 ------------------- /var/lib/stats/hdfs/1 As we'd previously seen HDFS just write to the local disk when a disk couldn't be mounted. HDFS is supposed to be able to handle failed disk, but it doesn't seem to be doing the right thing in this case. Is this a known problem, or is there some other way we should be configuring things to allow the DataNode to come up in this situation? (clearly we can remove the mount point from hdfs-site.xml, but that doesn't feel like the correct solution) Thanks - Adam
-
Re: Datanode won't start with bad diskBharath Mundlapudi 2011-03-24, 23:08
Also, you will need this patch.
https://issues.apache.org/jira/browse/HADOOP-7040 ________________________________ From: Bharath Mundlapudi <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Thursday, March 24, 2011 4:00 PM Subject: Re: Datanode won't start with bad disk Hi Adam, I have posted a patch for this problem for Hadoop version 20. Please refer the following Jira. https://issues.apache.org/jira/browse/HDFS-1592 -Bharath ________________________________ From: Adam Phelps <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Thursday, March 24, 2011 10:30 AM Subject: Re: Datanode won't start with bad disk We have a bad disk on one of our datanode machines, and while we have dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any problem while the DataNode process was running we are seeing a problem when we needed to restart the DataNode process: 2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker: Incorrect permissions were set on /var/lib/stats/hdfs/4, expected: rwxr-xr-x, while actual: ---------. Fixing... 2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2011-03-24 16:50:20,091 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not permitted In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk. It gets that permission error because we have the mount directory set to be immutable: root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/ ------------------- /var/lib/stats/hdfs/2 ----i------------e- /var/lib/stats/hdfs/4 ------------------- /var/lib/stats/hdfs/3 ------------------- /var/lib/stats/hdfs/1 As we'd previously seen HDFS just write to the local disk when a disk couldn't be mounted. HDFS is supposed to be able to handle failed disk, but it doesn't seem to be doing the right thing in this case. Is this a known problem, or is there some other way we should be configuring things to allow the DataNode to come up in this situation? (clearly we can remove the mount point from hdfs-site.xml, but that doesn't feel like the correct solution) Thanks - Adam
-
Re: Datanode won't start with bad diskAdam Phelps 2011-03-25, 00:21
Thanks for the info. We may implement this patch if this continues to
be a problem. - Adam On 3/24/11 4:08 PM, Bharath Mundlapudi wrote: > Also, you will need this patch. > https://issues.apache.org/jira/browse/HADOOP-7040 > > > ------------------------------------------------------------------------ > *From:* Bharath Mundlapudi <[EMAIL PROTECTED]> > *To:* "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > *Sent:* Thursday, March 24, 2011 4:00 PM > *Subject:* Re: Datanode won't start with bad disk > > Hi Adam, > > I have posted a patch for this problem for Hadoop version 20. Please > refer the following Jira. > https://issues.apache.org/jira/browse/HDFS-1592 > > -Bharath > > ------------------------------------------------------------------------ > *From:* Adam Phelps <[EMAIL PROTECTED]> > *To:* "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > *Sent:* Thursday, March 24, 2011 10:30 AM > *Subject:* Re: Datanode won't start with bad disk > > We have a bad disk on one of our datanode machines, and while we have > dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any > problem while the DataNode process was running we are seeing a problem > when we needed to restart the DataNode process: > > 2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker: > Incorrect permissions were set on /var/lib/stats/hdfs/4, expected: > rwxr-xr-x, while actual: ---------. Fixing... > 2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader: > Loaded the native-hadoop library > 2011-03-24 16:50:20,091 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not > permitted > > In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk. > It gets that permission error because we have the mount directory set to > be immutable: > > root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/ > ------------------- /var/lib/stats/hdfs/2 > ----i------------e- /var/lib/stats/hdfs/4 > ------------------- /var/lib/stats/hdfs/3 > ------------------- /var/lib/stats/hdfs/1 > > As we'd previously seen HDFS just write to the local disk when a disk > couldn't be mounted. > > HDFS is supposed to be able to handle failed disk, but it doesn't seem > to be doing the right thing in this case. Is this a known problem, or is > there some other way we should be configuring things to allow the > DataNode to come up in this situation? > > (clearly we can remove the mount point from hdfs-site.xml, but that > doesn't feel like the correct solution) > > Thanks > - Adam > > > >
-
Re: Datanode won't start with bad diskAllen Wittenauer 2011-03-25, 16:43
On Mar 24, 2011, at 10:47 AM, Adam Phelps wrote: > For reference, this is running hadoop 0.20.2 from the CDH3B4 distribution. Given that this isn't a standard Apache release, you'll likely be better served by asking Cloudera.
-
Re: Datanode won't start with bad diskAaron T. Myers 2011-03-25, 16:48
bcc: [EMAIL PROTECTED]
+ [EMAIL PROTECTED] Hey Adam, Thanks a lot for the bug report. I've added cdh-user@ to this email, which may be a more appropriate list for this question. Best, Aaron -- Aaron T. Myers Software Engineer, Cloudera On Thu, Mar 24, 2011 at 10:47 AM, Adam Phelps <[EMAIL PROTECTED]> wrote: > For reference, this is running hadoop 0.20.2 from the CDH3B4 distribution. > > - Adam > > > On 3/24/11 10:30 AM, Adam Phelps wrote: > >> We have a bad disk on one of our datanode machines, and while we have >> dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any >> problem while the DataNode process was running we are seeing a problem >> when we needed to restart the DataNode process: >> >> 2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker: >> Incorrect permissions were set on /var/lib/stats/hdfs/4, expected: >> rwxr-xr-x, while actual: ---------. Fixing... >> 2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader: >> Loaded the native-hadoop library >> 2011-03-24 16:50:20,091 ERROR >> org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not >> permitted >> >> In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk. >> It gets that permission error because we have the mount directory set to >> be immutable: >> >> root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/ >> ------------------- /var/lib/stats/hdfs/2 >> ----i------------e- /var/lib/stats/hdfs/4 >> ------------------- /var/lib/stats/hdfs/3 >> ------------------- /var/lib/stats/hdfs/1 >> >> As we'd previously seen HDFS just write to the local disk when a disk >> couldn't be mounted. >> >> HDFS is supposed to be able to handle failed disk, but it doesn't seem >> to be doing the right thing in this case. Is this a known problem, or is >> there some other way we should be configuring things to allow the >> DataNode to come up in this situation? >> >> (clearly we can remove the mount point from hdfs-site.xml, but that >> doesn't feel like the correct solution) >> >> Thanks >> - Adam >> >> > |