|
Nathaniel Cook
2012-05-08, 14:46
Todd Lipcon
2012-05-08, 16:28
Nathaniel Cook
2012-05-08, 17:33
Todd Lipcon
2012-05-08, 17:44
Nathaniel Cook
2012-05-08, 18:07
Jeff Whiting
2012-05-08, 19:38
Todd Lipcon
2012-05-08, 20:46
Jeff Whiting
2012-05-08, 23:20
Todd Lipcon
2012-05-09, 00:49
Jeff Whiting
2012-07-24, 19:05
Todd Lipcon
2012-07-25, 18:51
|
-
How to rebuild the shared edits directoryNathaniel Cook 2012-05-08, 14:46
We have be working with an HA hdfs cluster, testing several failover
scenarios. We have a small cluster of 4 machines spun up for testing. We run a namenode on two of the machines and hosted an nfs share on the third for the shared edits directory. The fourth machine is just a datanode. We configured the cluster for automatic failover using ZKFC. We can start and stop the namenodes with no problems, failover happens as expected. Then we tested breaking the shared edits directory. We stopped the nfs share and then reenabled it. This caused the loss of a few edits. This had no effect, as expected, on the namenodes, and the cluster functioned normally. We stopped the standby namenode and tried to start it again, it would not start because of the missing edits. No matter what we tried we could not rebuild the shared edits directory and thus get the second namenode back online. In this state the hdfs cluster continued to function but it was no longer an HA cluster. To get the cluster back in HA mode we had to reformat the namenode data with the shared edits. In this case how do you rebuild the shared edits data so you can get the cluster back to an HA mode? -- -Nathaniel Cook
-
Re: How to rebuild the shared edits directoryTodd Lipcon 2012-05-08, 16:28
On Tue, May 8, 2012 at 7:46 AM, Nathaniel Cook <[EMAIL PROTECTED]> wrote:
> We have be working with an HA hdfs cluster, testing several failover > scenarios. We have a small cluster of 4 machines spun up for testing. > We run a namenode on two of the machines and hosted an nfs share on > the third for the shared edits directory. The fourth machine is just a > datanode. We configured the cluster for automatic failover using ZKFC. > We can start and stop the namenodes with no problems, failover happens > as expected. Then we tested breaking the shared edits directory. We > stopped the nfs share and then reenabled it. This caused the loss of a > few edits. Really? What mount options are you using on your NFS mount? The active NN should abort immediately if the shared edits dir disappears. Do you have logs available from your NNs during this time? > This had no effect, as expected, on the namenodes, and the > cluster functioned normally. On the contrary, I'd expect the NN to bail out on the next edit (since it has no place to reliably fsync it) > We stopped the standby namenode and tried > to start it again, it would not start because of the missing edits. No > matter what we tried we could not rebuild the shared edits directory > and thus get the second namenode back online. In this state the hdfs > cluster continued to function but it was no longer an HA cluster. To > get the cluster back in HA mode we had to reformat the namenode data > with the shared edits. In this case how do you rebuild the shared > edits data so you can get the cluster back to an HA mode? It sounds like something went wrong with the facility that's supposed to make the active NN crash if shared edits go away. The logs will help. To answer your question, though, you can run the "initializeSharedEdits" process again to re-initialize that edits dir. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera
-
Re: How to rebuild the shared edits directoryNathaniel Cook 2012-05-08, 17:33
We ran the initializeSharedEdits command and it didn't have any
effect, but that my be because of the weird state we got it in. So help me understand: I was under the assumption that if shared edits went away you would lose the ability to failover and that is it. The active namenode would still function but would not failover and all standy namenodes would not try to become active. Is this correct? If it is the case that namenodes quit when they lose connection to the shared edits dir than doesn't the shared edits becomes the new single point of failure? Unfortunately we have cleared the logs from this test but we could try to reproduce it. On Tue, May 8, 2012 at 10:28 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > On Tue, May 8, 2012 at 7:46 AM, Nathaniel Cook <[EMAIL PROTECTED]> wrote: >> We have be working with an HA hdfs cluster, testing several failover >> scenarios. We have a small cluster of 4 machines spun up for testing. >> We run a namenode on two of the machines and hosted an nfs share on >> the third for the shared edits directory. The fourth machine is just a >> datanode. We configured the cluster for automatic failover using ZKFC. >> We can start and stop the namenodes with no problems, failover happens >> as expected. Then we tested breaking the shared edits directory. We >> stopped the nfs share and then reenabled it. This caused the loss of a >> few edits. > > Really? What mount options are you using on your NFS mount? > > The active NN should abort immediately if the shared edits dir > disappears. Do you have logs available from your NNs during this time? > >> This had no effect, as expected, on the namenodes, and the >> cluster functioned normally. > > On the contrary, I'd expect the NN to bail out on the next edit (since > it has no place to reliably fsync it) > >> We stopped the standby namenode and tried >> to start it again, it would not start because of the missing edits. No >> matter what we tried we could not rebuild the shared edits directory >> and thus get the second namenode back online. In this state the hdfs >> cluster continued to function but it was no longer an HA cluster. To >> get the cluster back in HA mode we had to reformat the namenode data >> with the shared edits. In this case how do you rebuild the shared >> edits data so you can get the cluster back to an HA mode? > > It sounds like something went wrong with the facility that's supposed > to make the active NN crash if shared edits go away. The logs will > help. > > To answer your question, though, you can run the > "initializeSharedEdits" process again to re-initialize that edits dir. > > Thanks > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera -- -Nathaniel Cook
-
Re: How to rebuild the shared edits directoryTodd Lipcon 2012-05-08, 17:44
On Tue, May 8, 2012 at 10:33 AM, Nathaniel Cook
<[EMAIL PROTECTED]> wrote: > We ran the initializeSharedEdits command and it didn't have any > effect, but that my be because of the weird state we got it in. > > So help me understand: I was under the assumption that if shared edits > went away you would lose the ability to failover and that is it. The > active namenode would still function but would not failover and all > standy namenodes would not try to become active. Is this correct? Unfortunately that's not the case. If you lose shared edits, your cluster should shut down. We currently require the NFS direcory to be highly available itself. This is achievable with even pretty inexpensive NAS devices from your vendor of choice. The reason for this behavior is as follows: if the active node loses access to the mount, it's unable to distinguish whether the mount itself died or if the node just had a local issue which broke the mount. Imagine for example that the NFS client had a bug which caused the mount to go away. Then, you'd continue running for quite some time without writing to shared edits. If your NN then crashed, a failover would cause you to revert to an old version of the namespace, and you'd have a case of permanent data loss due to divergence of the image before and after failover. There's work under way to remove this restriction which should be available for general use some time this summer or early fall, if I had to take a guess on timeline. > If > it is the case that namenodes quit when they lose connection to the > shared edits dir than doesn't the shared edits becomes the new single > point of failure? Yes, but it's an easy one to resolve. Most of our customers already have a NAS device in their datacenter, which has dual heads, dual PDUs, etc, and at least 5 9s of uptime. This HA setup is basically the same as you see in most enterprise HA systems which rely on shared storage. > > Unfortunately we have cleared the logs from this test but we could try > to reproduce it. That would be great, thanks! -Todd > > On Tue, May 8, 2012 at 10:28 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote: >> On Tue, May 8, 2012 at 7:46 AM, Nathaniel Cook <[EMAIL PROTECTED]> wrote: >>> We have be working with an HA hdfs cluster, testing several failover >>> scenarios. We have a small cluster of 4 machines spun up for testing. >>> We run a namenode on two of the machines and hosted an nfs share on >>> the third for the shared edits directory. The fourth machine is just a >>> datanode. We configured the cluster for automatic failover using ZKFC. >>> We can start and stop the namenodes with no problems, failover happens >>> as expected. Then we tested breaking the shared edits directory. We >>> stopped the nfs share and then reenabled it. This caused the loss of a >>> few edits. >> >> Really? What mount options are you using on your NFS mount? >> >> The active NN should abort immediately if the shared edits dir >> disappears. Do you have logs available from your NNs during this time? >> >>> This had no effect, as expected, on the namenodes, and the >>> cluster functioned normally. >> >> On the contrary, I'd expect the NN to bail out on the next edit (since >> it has no place to reliably fsync it) >> >>> We stopped the standby namenode and tried >>> to start it again, it would not start because of the missing edits. No >>> matter what we tried we could not rebuild the shared edits directory >>> and thus get the second namenode back online. In this state the hdfs >>> cluster continued to function but it was no longer an HA cluster. To >>> get the cluster back in HA mode we had to reformat the namenode data >>> with the shared edits. In this case how do you rebuild the shared >>> edits data so you can get the cluster back to an HA mode? >> >> It sounds like something went wrong with the facility that's supposed >> to make the active NN crash if shared edits go away. The logs will >> help. >> >> To answer your question, though, you can run the Todd Lipcon Software Engineer, Cloudera
-
Re: How to rebuild the shared edits directoryNathaniel Cook 2012-05-08, 18:07
On Tue, May 8, 2012 at 11:44 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> On Tue, May 8, 2012 at 10:33 AM, Nathaniel Cook > <[EMAIL PROTECTED]> wrote: >> We ran the initializeSharedEdits command and it didn't have any >> effect, but that my be because of the weird state we got it in. >> >> So help me understand: I was under the assumption that if shared edits >> went away you would lose the ability to failover and that is it. The >> active namenode would still function but would not failover and all >> standy namenodes would not try to become active. Is this correct? > > Unfortunately that's not the case. If you lose shared edits, your > cluster should shut down. We currently require the NFS direcory to be > highly available itself. This is achievable with even pretty > inexpensive NAS devices from your vendor of choice. > > The reason for this behavior is as follows: if the active node loses > access to the mount, it's unable to distinguish whether the mount > itself died or if the node just had a local issue which broke the > mount. Imagine for example that the NFS client had a bug which caused > the mount to go away. Then, you'd continue running for quite some time > without writing to shared edits. If your NN then crashed, a failover > would cause you to revert to an old version of the namespace, and > you'd have a case of permanent data loss due to divergence of the > image before and after failover. > Ok this makes sense but I still have a question. What if when the active namenode looses its connection to the shared edits instead of shutting down the cluster degrades to a "non-failover" cluster. So in the case of the active node losing its access to the mount, it could tell the standby nodes to either shutdown or to not ever attempt to become active using the already in place failover communication systems. Then you could fix the NFS mount while in "non-failover" mode and once its back up sync the missing edits. Then tell the standby nodes to come back online. This way both the active namenode and shared edits have to fail at the same time for the cluster to go down. The worst that can happen when the shared edits goes down is that you loose the ability to failover. It this plausible or are there parts of the system that would prevent this type of behavior? > There's work under way to remove this restriction which should be > available for general use some time this summer or early fall, if I > had to take a guess on timeline. > >> If >> it is the case that namenodes quit when they lose connection to the >> shared edits dir than doesn't the shared edits becomes the new single >> point of failure? > > Yes, but it's an easy one to resolve. Most of our customers already > have a NAS device in their datacenter, which has dual heads, dual > PDUs, etc, and at least 5 9s of uptime. This HA setup is basically the > same as you see in most enterprise HA systems which rely on shared > storage. What about cloud environments where you dont have the ability to create a NAS? AWS for example. > >> >> Unfortunately we have cleared the logs from this test but we could try >> to reproduce it. > > That would be great, thanks! > > -Todd > >> >> On Tue, May 8, 2012 at 10:28 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote: >>> On Tue, May 8, 2012 at 7:46 AM, Nathaniel Cook <[EMAIL PROTECTED]> wrote: >>>> We have be working with an HA hdfs cluster, testing several failover >>>> scenarios. We have a small cluster of 4 machines spun up for testing. >>>> We run a namenode on two of the machines and hosted an nfs share on >>>> the third for the shared edits directory. The fourth machine is just a >>>> datanode. We configured the cluster for automatic failover using ZKFC. >>>> We can start and stop the namenodes with no problems, failover happens >>>> as expected. Then we tested breaking the shared edits directory. We >>>> stopped the nfs share and then reenabled it. This caused the loss of a >>>> few edits. >>> >>> Really? What mount options are you using on your NFS mount? -Nathaniel Cook
-
Re: How to rebuild the shared edits directoryJeff Whiting 2012-05-08, 19:38
It seems the NN was originally written with the assumption that disks fail and stuff happens. Hence
the ability to have multiple directories store your NN data even though each directory is mostly likely redundant / HA. [start rant] My opinion is that it is a step backwards that the shared edits wasn't written with the same assumptions. If any one problem can take out your cluster then it isn't HA. So allowing a single nfs failure taking down your cluster and saying make nfs HA, just seems to move the HA problem not solve it. I would expect a true HA solution to be completely self contained within the hadoop ecosystem. All machines fail...eventually and it needs to be planned for. At a minimum a failure of the shared edits should only disable fail over and provide a recovery mechanism; Ideally the NN should have been rewritten to be a cluster (similar to zookeeper or ceph) to enable HA. [end rant] Sorry for the rant. I just really want to see HDFS become complete HA system without caveats. ~Jeff On 5/8/2012 11:44 AM, Todd Lipcon wrote: > On Tue, May 8, 2012 at 10:33 AM, Nathaniel Cook > <[EMAIL PROTECTED]> wrote: >> We ran the initializeSharedEdits command and it didn't have any >> effect, but that my be because of the weird state we got it in. >> >> So help me understand: I was under the assumption that if shared edits >> went away you would lose the ability to failover and that is it. The >> active namenode would still function but would not failover and all >> standy namenodes would not try to become active. Is this correct? > Unfortunately that's not the case. If you lose shared edits, your > cluster should shut down. We currently require the NFS direcory to be > highly available itself. This is achievable with even pretty > inexpensive NAS devices from your vendor of choice. > > The reason for this behavior is as follows: if the active node loses > access to the mount, it's unable to distinguish whether the mount > itself died or if the node just had a local issue which broke the > mount. Imagine for example that the NFS client had a bug which caused > the mount to go away. Then, you'd continue running for quite some time > without writing to shared edits. If your NN then crashed, a failover > would cause you to revert to an old version of the namespace, and > you'd have a case of permanent data loss due to divergence of the > image before and after failover. > > There's work under way to remove this restriction which should be > available for general use some time this summer or early fall, if I > had to take a guess on timeline. > >> If >> it is the case that namenodes quit when they lose connection to the >> shared edits dir than doesn't the shared edits becomes the new single >> point of failure? > Yes, but it's an easy one to resolve. Most of our customers already > have a NAS device in their datacenter, which has dual heads, dual > PDUs, etc, and at least 5 9s of uptime. This HA setup is basically the > same as you see in most enterprise HA systems which rely on shared > storage. > >> Unfortunately we have cleared the logs from this test but we could try >> to reproduce it. > That would be great, thanks! > > -Todd > >> On Tue, May 8, 2012 at 10:28 AM, Todd Lipcon<[EMAIL PROTECTED]> wrote: >>> On Tue, May 8, 2012 at 7:46 AM, Nathaniel Cook<[EMAIL PROTECTED]> wrote: >>>> We have be working with an HA hdfs cluster, testing several failover >>>> scenarios. We have a small cluster of 4 machines spun up for testing. >>>> We run a namenode on two of the machines and hosted an nfs share on >>>> the third for the shared edits directory. The fourth machine is just a >>>> datanode. We configured the cluster for automatic failover using ZKFC. >>>> We can start and stop the namenodes with no problems, failover happens >>>> as expected. Then we tested breaking the shared edits directory. We >>>> stopped the nfs share and then reenabled it. This caused the loss of a >>>> few edits. >>> Really? What mount options are you using on your NFS mount? Jeff Whiting Qualtrics Senior Software Engineer [EMAIL PROTECTED]
-
Re: How to rebuild the shared edits directoryTodd Lipcon 2012-05-08, 20:46
On Tue, May 8, 2012 at 12:38 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote:
> It seems the NN was originally written with the assumption that disks fail > and stuff happens. Hence the ability to have multiple directories store > your NN data even though each directory is mostly likely redundant / HA. > > [start rant] > > My opinion is that it is a step backwards that the shared edits wasn't > written with the same assumptions. If any one problem can take out your > cluster then it isn't HA. So allowing a single nfs failure taking down > your cluster and saying make nfs HA, just seems to move the HA problem not > solve it. I would expect a true HA solution to be completely self contained > within the hadoop ecosystem. All machines fail...eventually and it needs to > be planned for. At a minimum a failure of the shared edits should only > disable fail over and provide a recovery mechanism; Ideally the NN should > have been rewritten to be a cluster (similar to zookeeper or ceph) to enable > HA. > > [end rant] Like I said earlier in the thread, work is already under way on this and should be complete within a number of months. In many practical deployments, what we have already can provide complete HA. In others, like the AWS example you mentioned, we need a bit more, and we're working on it. Hang on a bit longer and it will be good to go. -Todd > > Sorry for the rant. I just really want to see HDFS become complete HA > system without caveats. > > ~Jeff > > > On 5/8/2012 11:44 AM, Todd Lipcon wrote: >> >> On Tue, May 8, 2012 at 10:33 AM, Nathaniel Cook >> <[EMAIL PROTECTED]> wrote: >>> >>> We ran the initializeSharedEdits command and it didn't have any >>> effect, but that my be because of the weird state we got it in. >>> >>> So help me understand: I was under the assumption that if shared edits >>> went away you would lose the ability to failover and that is it. The >>> active namenode would still function but would not failover and all >>> standy namenodes would not try to become active. Is this correct? >> >> Unfortunately that's not the case. If you lose shared edits, your >> cluster should shut down. We currently require the NFS direcory to be >> highly available itself. This is achievable with even pretty >> inexpensive NAS devices from your vendor of choice. >> >> The reason for this behavior is as follows: if the active node loses >> access to the mount, it's unable to distinguish whether the mount >> itself died or if the node just had a local issue which broke the >> mount. Imagine for example that the NFS client had a bug which caused >> the mount to go away. Then, you'd continue running for quite some time >> without writing to shared edits. If your NN then crashed, a failover >> would cause you to revert to an old version of the namespace, and >> you'd have a case of permanent data loss due to divergence of the >> image before and after failover. >> >> There's work under way to remove this restriction which should be >> available for general use some time this summer or early fall, if I >> had to take a guess on timeline. >> >>> If >>> it is the case that namenodes quit when they lose connection to the >>> shared edits dir than doesn't the shared edits becomes the new single >>> point of failure? >> >> Yes, but it's an easy one to resolve. Most of our customers already >> have a NAS device in their datacenter, which has dual heads, dual >> PDUs, etc, and at least 5 9s of uptime. This HA setup is basically the >> same as you see in most enterprise HA systems which rely on shared >> storage. >> >>> Unfortunately we have cleared the logs from this test but we could try >>> to reproduce it. >> >> That would be great, thanks! >> >> -Todd >> >>> On Tue, May 8, 2012 at 10:28 AM, Todd Lipcon<[EMAIL PROTECTED]> wrote: >>>> >>>> On Tue, May 8, 2012 at 7:46 AM, Nathaniel Cook<[EMAIL PROTECTED]> >>>> wrote: >>>>> >>>>> We have be working with an HA hdfs cluster, testing several failover >>>>> scenarios. We have a small cluster of 4 machines spun up for testing. Todd Lipcon Software Engineer, Cloudera
-
Re: How to rebuild the shared edits directoryJeff Whiting 2012-05-08, 23:20
Thanks for being patient and listening to my rants. I'm excited to see hdfs continue to move
forward. If the organization I'm working for was willing spend some resources to help speed this process up, where should be start looking? I'm sure there are quite a few jiras on these issues. Most of what we've done with the hadoop eco system has been zookeeper and hbase related. Thanks, ~Jeff On 5/8/2012 2:46 PM, Todd Lipcon wrote: > On Tue, May 8, 2012 at 12:38 PM, Jeff Whiting<[EMAIL PROTECTED]> wrote: >> It seems the NN was originally written with the assumption that disks fail >> and stuff happens. Hence the ability to have multiple directories store >> your NN data even though each directory is mostly likely redundant / HA. >> >> [start rant] >> >> My opinion is that it is a step backwards that the shared edits wasn't >> written with the same assumptions. If any one problem can take out your >> cluster then it isn't HA. So allowing a single nfs failure taking down >> your cluster and saying make nfs HA, just seems to move the HA problem not >> solve it. I would expect a true HA solution to be completely self contained >> within the hadoop ecosystem. All machines fail...eventually and it needs to >> be planned for. At a minimum a failure of the shared edits should only >> disable fail over and provide a recovery mechanism; Ideally the NN should >> have been rewritten to be a cluster (similar to zookeeper or ceph) to enable >> HA. >> >> [end rant] > Like I said earlier in the thread, work is already under way on this > and should be complete within a number of months. > > In many practical deployments, what we have already can provide > complete HA. In others, like the AWS example you mentioned, we need a > bit more, and we're working on it. Hang on a bit longer and it will be > good to go. > > -Todd > >> Sorry for the rant. I just really want to see HDFS become complete HA >> system without caveats. >> >> ~Jeff >> >> >> On 5/8/2012 11:44 AM, Todd Lipcon wrote: >>> On Tue, May 8, 2012 at 10:33 AM, Nathaniel Cook >>> <[EMAIL PROTECTED]> wrote: >>>> We ran the initializeSharedEdits command and it didn't have any >>>> effect, but that my be because of the weird state we got it in. >>>> >>>> So help me understand: I was under the assumption that if shared edits >>>> went away you would lose the ability to failover and that is it. The >>>> active namenode would still function but would not failover and all >>>> standy namenodes would not try to become active. Is this correct? >>> Unfortunately that's not the case. If you lose shared edits, your >>> cluster should shut down. We currently require the NFS direcory to be >>> highly available itself. This is achievable with even pretty >>> inexpensive NAS devices from your vendor of choice. >>> >>> The reason for this behavior is as follows: if the active node loses >>> access to the mount, it's unable to distinguish whether the mount >>> itself died or if the node just had a local issue which broke the >>> mount. Imagine for example that the NFS client had a bug which caused >>> the mount to go away. Then, you'd continue running for quite some time >>> without writing to shared edits. If your NN then crashed, a failover >>> would cause you to revert to an old version of the namespace, and >>> you'd have a case of permanent data loss due to divergence of the >>> image before and after failover. >>> >>> There's work under way to remove this restriction which should be >>> available for general use some time this summer or early fall, if I >>> had to take a guess on timeline. >>> >>>> If >>>> it is the case that namenodes quit when they lose connection to the >>>> shared edits dir than doesn't the shared edits becomes the new single >>>> point of failure? >>> Yes, but it's an easy one to resolve. Most of our customers already >>> have a NAS device in their datacenter, which has dual heads, dual >>> PDUs, etc, and at least 5 9s of uptime. This HA setup is basically the Jeff Whiting Qualtrics Senior Software Engineer [EMAIL PROTECTED]
-
Re: How to rebuild the shared edits directoryTodd Lipcon 2012-05-09, 00:49
Hi Jeff,
Check out HDFS-3077. We'll probably need the most help when it comes time to do testing. Any testing you can do on the current HA solution, non-ideal as it may be, is also immensely valuable. For example, if you can reproduce the case where it didn't exit upon loss of shared edits, that would also be a bug which would hit the quorum-based solution. Thanks -Todd On Tue, May 8, 2012 at 4:20 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote: > Thanks for being patient and listening to my rants. I'm excited to see hdfs > continue to move forward. If the organization I'm working for was willing > spend some resources to help speed this process up, where should be start > looking? I'm sure there are quite a few jiras on these issues. > > Most of what we've done with the hadoop eco system has been zookeeper and > hbase related. > > Thanks, > ~Jeff > > > On 5/8/2012 2:46 PM, Todd Lipcon wrote: >> >> On Tue, May 8, 2012 at 12:38 PM, Jeff Whiting<[EMAIL PROTECTED]> wrote: >>> >>> It seems the NN was originally written with the assumption that disks >>> fail >>> and stuff happens. Hence the ability to have multiple directories store >>> your NN data even though each directory is mostly likely redundant / HA. >>> >>> [start rant] >>> >>> My opinion is that it is a step backwards that the shared edits wasn't >>> written with the same assumptions. If any one problem can take out your >>> cluster then it isn't HA. So allowing a single nfs failure taking down >>> your cluster and saying make nfs HA, just seems to move the HA problem >>> not >>> solve it. I would expect a true HA solution to be completely self >>> contained >>> within the hadoop ecosystem. All machines fail...eventually and it needs >>> to >>> be planned for. At a minimum a failure of the shared edits should only >>> disable fail over and provide a recovery mechanism; Ideally the NN should >>> have been rewritten to be a cluster (similar to zookeeper or ceph) to >>> enable >>> HA. >>> >>> [end rant] >> >> Like I said earlier in the thread, work is already under way on this >> and should be complete within a number of months. >> >> In many practical deployments, what we have already can provide >> complete HA. In others, like the AWS example you mentioned, we need a >> bit more, and we're working on it. Hang on a bit longer and it will be >> good to go. >> >> -Todd >> >>> Sorry for the rant. I just really want to see HDFS become complete HA >>> system without caveats. >>> >>> ~Jeff >>> >>> >>> On 5/8/2012 11:44 AM, Todd Lipcon wrote: >>>> >>>> On Tue, May 8, 2012 at 10:33 AM, Nathaniel Cook >>>> <[EMAIL PROTECTED]> wrote: >>>>> >>>>> We ran the initializeSharedEdits command and it didn't have any >>>>> effect, but that my be because of the weird state we got it in. >>>>> >>>>> So help me understand: I was under the assumption that if shared edits >>>>> went away you would lose the ability to failover and that is it. The >>>>> active namenode would still function but would not failover and all >>>>> standy namenodes would not try to become active. Is this correct? >>>> >>>> Unfortunately that's not the case. If you lose shared edits, your >>>> cluster should shut down. We currently require the NFS direcory to be >>>> highly available itself. This is achievable with even pretty >>>> inexpensive NAS devices from your vendor of choice. >>>> >>>> The reason for this behavior is as follows: if the active node loses >>>> access to the mount, it's unable to distinguish whether the mount >>>> itself died or if the node just had a local issue which broke the >>>> mount. Imagine for example that the NFS client had a bug which caused >>>> the mount to go away. Then, you'd continue running for quite some time >>>> without writing to shared edits. If your NN then crashed, a failover >>>> would cause you to revert to an old version of the namespace, and >>>> you'd have a case of permanent data loss due to divergence of the >>>> image before and after failover. > Todd Lipcon Software Engineer, Cloudera
-
Re: How to rebuild the shared edits directoryJeff Whiting 2012-07-24, 19:05
Todd or anyone who knows,
I'm reviving an old thread because we are collocating into a data center rather than just using the cloud. You mentioned "We currently require the NFS direcory to be highly available itself. This is achievable with even pretty inexpensive NAS devices from your vendor of choice." What hardware would you suggest that would give us an HA filer? Specifically we are going all HP in the colo. I've looked around and was unable to find any suggestions. The docs just say "high-quality dedicated NAS appliance." Any suggestions would be great! https://ccp.cloudera.com/display/CDH4DOC/HDFS+High+Availability+Hardware+Configuration http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/ http://www.slideshare.net/hortonworks/nn-ha-hadoop-worldfinal-10173419 Thanks, ~Jeff On 5/8/2012 6:49 PM, Todd Lipcon wrote: > Hi Jeff, > > Check out HDFS-3077. We'll probably need the most help when it comes > time to do testing. Any testing you can do on the current HA solution, > non-ideal as it may be, is also immensely valuable. For example, if > you can reproduce the case where it didn't exit upon loss of shared > edits, that would also be a bug which would hit the quorum-based > solution. > > Thanks > -Todd > > On Tue, May 8, 2012 at 4:20 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote: >> Thanks for being patient and listening to my rants. I'm excited to see hdfs >> continue to move forward. If the organization I'm working for was willing >> spend some resources to help speed this process up, where should be start >> looking? I'm sure there are quite a few jiras on these issues. >> >> Most of what we've done with the hadoop eco system has been zookeeper and >> hbase related. >> >> Thanks, >> ~Jeff >> >> >> On 5/8/2012 2:46 PM, Todd Lipcon wrote: >>> On Tue, May 8, 2012 at 12:38 PM, Jeff Whiting<[EMAIL PROTECTED]> wrote: >>>> It seems the NN was originally written with the assumption that disks >>>> fail >>>> and stuff happens. Hence the ability to have multiple directories store >>>> your NN data even though each directory is mostly likely redundant / HA. >>>> >>>> [start rant] >>>> >>>> My opinion is that it is a step backwards that the shared edits wasn't >>>> written with the same assumptions. If any one problem can take out your >>>> cluster then it isn't HA. So allowing a single nfs failure taking down >>>> your cluster and saying make nfs HA, just seems to move the HA problem >>>> not >>>> solve it. I would expect a true HA solution to be completely self >>>> contained >>>> within the hadoop ecosystem. All machines fail...eventually and it needs >>>> to >>>> be planned for. At a minimum a failure of the shared edits should only >>>> disable fail over and provide a recovery mechanism; Ideally the NN should >>>> have been rewritten to be a cluster (similar to zookeeper or ceph) to >>>> enable >>>> HA. >>>> >>>> [end rant] >>> Like I said earlier in the thread, work is already under way on this >>> and should be complete within a number of months. >>> >>> In many practical deployments, what we have already can provide >>> complete HA. In others, like the AWS example you mentioned, we need a >>> bit more, and we're working on it. Hang on a bit longer and it will be >>> good to go. >>> >>> -Todd >>> >>>> Sorry for the rant. I just really want to see HDFS become complete HA >>>> system without caveats. >>>> >>>> ~Jeff >>>> >>>> >>>> On 5/8/2012 11:44 AM, Todd Lipcon wrote: >>>>> On Tue, May 8, 2012 at 10:33 AM, Nathaniel Cook >>>>> <[EMAIL PROTECTED]> wrote: >>>>>> We ran the initializeSharedEdits command and it didn't have any >>>>>> effect, but that my be because of the weird state we got it in. >>>>>> >>>>>> So help me understand: I was under the assumption that if shared edits >>>>>> went away you would lose the ability to failover and that is it. The >>>>>> active namenode would still function but would not failover and all >>>>>> standy namenodes would not try to become active. Is this correct? Jeff Whiting Qualtrics Senior Software Engineer [EMAIL PROTECTED]
-
Re: How to rebuild the shared edits directoryTodd Lipcon 2012-07-25, 18:51
Hi Jeff,
I don't know the HP offerings very well myself, but I know some of our customers are successfully using lower end NetApp devices. You should also be aware that work on the NAS-less shared storage is well under way: HDFS-3077. So if your timeline is more than a few months out to production, you may consider waiting for it to get your HA setup running. -Todd On Tue, Jul 24, 2012 at 12:05 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote: > Todd or anyone who knows, > > I'm reviving an old thread because we are collocating into a data center > rather than just using the cloud. You mentioned "We currently require the > NFS direcory to be highly available itself. This is achievable with even > pretty inexpensive NAS devices from your vendor of choice." What hardware > would you suggest that would give us an HA filer? Specifically we are going > all HP in the colo. > > I've looked around and was unable to find any suggestions. The docs just > say "high-quality dedicated NAS appliance." Any suggestions would be great! > > https://ccp.cloudera.com/display/CDH4DOC/HDFS+High+Availability+Hardware+Configuration > http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/ > http://www.slideshare.net/hortonworks/nn-ha-hadoop-worldfinal-10173419 > > Thanks, > ~Jeff > > > On 5/8/2012 6:49 PM, Todd Lipcon wrote: >> >> Hi Jeff, >> >> Check out HDFS-3077. We'll probably need the most help when it comes >> time to do testing. Any testing you can do on the current HA solution, >> non-ideal as it may be, is also immensely valuable. For example, if >> you can reproduce the case where it didn't exit upon loss of shared >> edits, that would also be a bug which would hit the quorum-based >> solution. >> >> Thanks >> -Todd >> >> On Tue, May 8, 2012 at 4:20 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote: >>> >>> Thanks for being patient and listening to my rants. I'm excited to see >>> hdfs >>> continue to move forward. If the organization I'm working for was >>> willing >>> spend some resources to help speed this process up, where should be start >>> looking? I'm sure there are quite a few jiras on these issues. >>> >>> Most of what we've done with the hadoop eco system has been zookeeper and >>> hbase related. >>> >>> Thanks, >>> ~Jeff >>> >>> >>> On 5/8/2012 2:46 PM, Todd Lipcon wrote: >>>> >>>> On Tue, May 8, 2012 at 12:38 PM, Jeff Whiting<[EMAIL PROTECTED]> >>>> wrote: >>>>> >>>>> It seems the NN was originally written with the assumption that disks >>>>> fail >>>>> and stuff happens. Hence the ability to have multiple directories >>>>> store >>>>> your NN data even though each directory is mostly likely redundant / >>>>> HA. >>>>> >>>>> [start rant] >>>>> >>>>> My opinion is that it is a step backwards that the shared edits wasn't >>>>> written with the same assumptions. If any one problem can take out >>>>> your >>>>> cluster then it isn't HA. So allowing a single nfs failure taking >>>>> down >>>>> your cluster and saying make nfs HA, just seems to move the HA problem >>>>> not >>>>> solve it. I would expect a true HA solution to be completely self >>>>> contained >>>>> within the hadoop ecosystem. All machines fail...eventually and it >>>>> needs >>>>> to >>>>> be planned for. At a minimum a failure of the shared edits should only >>>>> disable fail over and provide a recovery mechanism; Ideally the NN >>>>> should >>>>> have been rewritten to be a cluster (similar to zookeeper or ceph) to >>>>> enable >>>>> HA. >>>>> >>>>> [end rant] >>>> >>>> Like I said earlier in the thread, work is already under way on this >>>> and should be complete within a number of months. >>>> >>>> In many practical deployments, what we have already can provide >>>> complete HA. In others, like the AWS example you mentioned, we need a >>>> bit more, and we're working on it. Hang on a bit longer and it will be >>>> good to go. >>>> >>>> -Todd >>>> >>>>> Sorry for the rant. I just really want to see HDFS become complete HA Todd Lipcon Software Engineer, Cloudera |