|
|
-
Dealing with single point of failure
Mark 2011-10-29, 18:46
How does one deal with the fact that HBase has a single point of failure.. namely the namenode. What steps can be taken to eliminate and/or minimize the impact of a namenode failure? What can a situation where reliability is of utmost importance should one choose an alternative technology.. ie Cassandra?
Thanks
-
Re: Dealing with single point of failure
Stuart Smith 2011-10-29, 20:28
I was under the impression you could use HBase with a different distributed filesystem (other than HDFS). That would fix your SPOF.
HBase has other issues quite frankly (and I use it, can keep planning on using it). Mainly due to the fact it's under quite heavy development, but I don't think Cassandra would help with that.
If reliability is of utmost importance, I would start with hardware, not software. Good hardware, colo, etc.
Then assume you made all the wrong hardware choices, and that it will fail, and look at software :) Take care, -stu
________________________________ From: Mark <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Saturday, October 29, 2011 11:46 AM Subject: Dealing with single point of failure
How does one deal with the fact that HBase has a single point of failure.. namely the namenode. What steps can be taken to eliminate and/or minimize the impact of a namenode failure? What can a situation where reliability is of utmost importance should one choose an alternative technology.. ie Cassandra?
Thanks
+
Stuart Smith 2011-10-29, 20:28
-
Re: Dealing with single point of failure
lars hofhansl 2011-10-29, 20:34
This is more of "theoretical problem" really. Yahoo and others claim they lost far more data due to human error than any HDFS problems (including Namenode failures).
You can prevent data loss by having the namenode write the metadata to another machine (via NFS or DRBD or if you have a SAN). You'll still have an outage while switching over to a different machine, but at least you won't lose any data. Facebook has a partial solution (Avatarnode) and the HSFS folks are working on a solution (which like Avatarnode mainly involves keeping a hot copy of the Namenode so that failover is "instantaneous" - 1 or 2 minutes at most). ----- Original Message ----- From: Mark <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Saturday, October 29, 2011 11:46 AM Subject: Dealing with single point of failure
How does one deal with the fact that HBase has a single point of failure.. namely the namenode. What steps can be taken to eliminate and/or minimize the impact of a namenode failure? What can a situation where reliability is of utmost importance should one choose an alternative technology.. ie Cassandra?
Thanks
+
lars hofhansl 2011-10-29, 20:34
-
Re: Dealing with single point of failure
Mark 2011-10-29, 20:37
I was unaware of Avatarnode and the future plans to integrate such a solution. This makes feel a lot more at ease in choosing HBase.
Do you happen to have a JIRA ticket that references this ticket so I can monitor it and if possible, contribute.
Thanks
On 10/29/11 1:34 PM, lars hofhansl wrote: > This is more of "theoretical problem" really. > Yahoo and others claim they lost far more data due to human error than any HDFS problems (including Namenode failures). > > You can prevent data loss by having the namenode write the metadata to another machine (via NFS or DRBD or if you have a SAN). > You'll still have an outage while switching over to a different machine, but at least you won't lose any data. > > > Facebook has a partial solution (Avatarnode) and the HSFS folks are working on a solution (which like Avatarnode mainly involves keeping > a hot copy of the Namenode so that failover is "instantaneous" - 1 or 2 minutes at most). > > > ----- Original Message ----- > From: Mark<[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: > Sent: Saturday, October 29, 2011 11:46 AM > Subject: Dealing with single point of failure > > How does one deal with the fact that HBase has a single point of failure.. namely the namenode. What steps can be taken to eliminate and/or minimize the impact of a namenode failure? What can a situation where reliability is of utmost importance should one choose an alternative technology.. ie Cassandra? > > Thanks >
-
Re: Dealing with single point of failure
Harsh J 2011-10-29, 21:26
Mark, This is the parent issue of the HA NameNode work: https://issues.apache.org/jira/browse/HDFS-1623On 30-Oct-2011, at 2:07 AM, Mark wrote: > I was unaware of Avatarnode and the future plans to integrate such a solution. This makes feel a lot more at ease in choosing HBase. > > Do you happen to have a JIRA ticket that references this ticket so I can monitor it and if possible, contribute. > > Thanks > > On 10/29/11 1:34 PM, lars hofhansl wrote: >> This is more of "theoretical problem" really. >> Yahoo and others claim they lost far more data due to human error than any HDFS problems (including Namenode failures). >> >> You can prevent data loss by having the namenode write the metadata to another machine (via NFS or DRBD or if you have a SAN). >> You'll still have an outage while switching over to a different machine, but at least you won't lose any data. >> >> >> Facebook has a partial solution (Avatarnode) and the HSFS folks are working on a solution (which like Avatarnode mainly involves keeping >> a hot copy of the Namenode so that failover is "instantaneous" - 1 or 2 minutes at most). >> >> >> ----- Original Message ----- >> From: Mark<[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Cc: >> Sent: Saturday, October 29, 2011 11:46 AM >> Subject: Dealing with single point of failure >> >> How does one deal with the fact that HBase has a single point of failure.. namely the namenode. What steps can be taken to eliminate and/or minimize the impact of a namenode failure? What can a situation where reliability is of utmost importance should one choose an alternative technology.. ie Cassandra? >> >> Thanks >>
+
Harsh J 2011-10-29, 21:26
-
Re: Dealing with single point of failure
M. C. Srivas 2011-12-14, 07:28
On Sat, Oct 29, 2011 at 1:34 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> This is more of "theoretical problem" really. > Yahoo and others claim they lost far more data due to human error than any > HDFS problems (including Namenode failures). >
Actually it is not theoretical at all.
SPOF != data-loss.
Data-loss can occur even if you don't have any SPOF's. Vice versa, many SPOF systems do not have data-loss (eg, a single Netapp).
SPOF == lack of high-availability.
Which is indeed the case with HDFS, even at Y! For example, when a cluster is upgraded it becomes unavailable.
@Mark: the Avatar-node is not for the faint-hearted. AFAIK, only FB runs it. Konstantin Shvachko and co at eBay have a much better NN-SPOF solution in 0.22 that was just released. I recommend you try that.
> You can prevent data loss by having the namenode write the metadata to > another machine (via NFS or DRBD or if you have a SAN). > You'll still have an outage while switching over to a different machine, > but at least you won't lose any data. > > > Facebook has a partial solution (Avatarnode) and the HSFS folks are > working on a solution (which like Avatarnode mainly involves keeping > a hot copy of the Namenode so that failover is "instantaneous" - 1 or 2 > minutes at most). > > > ----- Original Message ----- > From: Mark <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: > Sent: Saturday, October 29, 2011 11:46 AM > Subject: Dealing with single point of failure > > How does one deal with the fact that HBase has a single point of failure.. > namely the namenode. What steps can be taken to eliminate and/or minimize > the impact of a namenode failure? What can a situation where reliability is > of utmost importance should one choose an alternative technology.. ie > Cassandra? > > Thanks > >
+
M. C. Srivas 2011-12-14, 07:28
-
Re: Dealing with single point of failure
Michel Segel 2011-12-15, 02:23
Sorry to join late... SPoF is a real problem if your planning to serve data realtime from your cluster. ( Yes you can do this w HBase ...)
Then, regardless of data loss, you have to bring up the cluster. Down time can be significant enough to kill your business, depending on your use case. Sure there are ways to make the NN more fault tolerant, but then you increase the complexity of your solution and still have to worry about automatic failover.
MapR did a nice little trick that I would expect to show up in some fashion in Apache some time down the road.
( my bet is that someone will be clever enough to reverse engineer this. )
Sent from a remote device. Please excuse any typos...
Mike Segel
On Dec 14, 2011, at 1:28 AM, "M. C. Srivas" <[EMAIL PROTECTED]> wrote:
> On Sat, Oct 29, 2011 at 1:34 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > >> This is more of "theoretical problem" really. >> Yahoo and others claim they lost far more data due to human error than any >> HDFS problems (including Namenode failures). >> > > Actually it is not theoretical at all. > > SPOF != data-loss. > > Data-loss can occur even if you don't have any SPOF's. Vice versa, many > SPOF systems do not have data-loss (eg, a single Netapp). > > SPOF == lack of high-availability. > > Which is indeed the case with HDFS, even at Y! For example, when a cluster > is upgraded it becomes unavailable. > > @Mark: > the Avatar-node is not for the faint-hearted. AFAIK, only FB runs it. > Konstantin Shvachko and co at eBay have a much better NN-SPOF solution in > 0.22 that was just released. I recommend you try that. > > > > > > > > > >> You can prevent data loss by having the namenode write the metadata to >> another machine (via NFS or DRBD or if you have a SAN). >> You'll still have an outage while switching over to a different machine, >> but at least you won't lose any data. >> >> >> Facebook has a partial solution (Avatarnode) and the HSFS folks are >> working on a solution (which like Avatarnode mainly involves keeping >> a hot copy of the Namenode so that failover is "instantaneous" - 1 or 2 >> minutes at most). >> >> >> ----- Original Message ----- >> From: Mark <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Cc: >> Sent: Saturday, October 29, 2011 11:46 AM >> Subject: Dealing with single point of failure >> >> How does one deal with the fact that HBase has a single point of failure.. >> namely the namenode. What steps can be taken to eliminate and/or minimize >> the impact of a namenode failure? What can a situation where reliability is >> of utmost importance should one choose an alternative technology.. ie >> Cassandra? >> >> Thanks >> >>
+
Michel Segel 2011-12-15, 02:23
-
Re: Dealing with single point of failure
Li Pi 2011-10-29, 20:51
You can do a variety of things, including having the namenode run on an incredibly resilient piece of hardware, (raid 1, redundant psus, etc). >From the software perspective, you can look at the avatar node: http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.htmlor something like this: http://www.cloudera.com/blog/2009/07/hadoop-ha-configuration/On Sat, Oct 29, 2011 at 11:46 AM, Mark <[EMAIL PROTECTED]> wrote: > How does one deal with the fact that HBase has a single point of failure.. > namely the namenode. What steps can be taken to eliminate and/or minimize > the impact of a namenode failure? What can a situation where reliability is > of utmost importance should one choose an alternative technology.. ie > Cassandra? > > Thanks >
+
Li Pi 2011-10-29, 20:51
|
|