Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Dealing with single point of failure


+
Mark 2011-10-29, 18:46
+
Stuart Smith 2011-10-29, 20:28
+
lars hofhansl 2011-10-29, 20:34
+
Mark 2011-10-29, 20:37
+
Harsh J 2011-10-29, 21:26
+
M. C. Srivas 2011-12-14, 07:28
Copy link to this message
-
Re: Dealing with single point of failure
Sorry to join late...
SPoF is a real problem if your planning to serve data realtime from your cluster.
( Yes you can do this w HBase ...)

Then, regardless of data loss, you have to bring up the cluster.
Down time can be significant enough to kill your business, depending on your use case.
Sure there are ways to make the NN more fault tolerant, but then you increase the complexity of your solution and still have to worry about automatic failover.

MapR did a nice little trick that I would expect to show up in some fashion in Apache some time down the road.

( my bet is that someone will be clever enough to reverse engineer this. )

Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 14, 2011, at 1:28 AM, "M. C. Srivas" <[EMAIL PROTECTED]> wrote:

> On Sat, Oct 29, 2011 at 1:34 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>> This is more of "theoretical problem" really.
>> Yahoo and others claim they lost far more data due to human error than any
>> HDFS problems (including Namenode failures).
>>
>
> Actually it is not theoretical at all.
>
> SPOF  !=  data-loss.
>
> Data-loss can occur even if you don't have any SPOF's.  Vice versa, many
> SPOF systems do not have data-loss (eg, a single Netapp).
>
> SPOF == lack of high-availability.
>
> Which is indeed the case with HDFS, even at Y!  For example, when a cluster
> is upgraded it becomes unavailable.
>
> @Mark:
> the Avatar-node is not for the faint-hearted. AFAIK, only FB runs it.
> Konstantin Shvachko and co at eBay have a much better NN-SPOF solution in
> 0.22 that was just released. I recommend you try that.
>
>
>
>
>
>
>
>
>
>> You can prevent data loss by having the namenode write the metadata to
>> another machine (via NFS or DRBD or if you have a SAN).
>> You'll still have an outage while switching over to a different machine,
>> but at least you won't lose any data.
>>
>>
>> Facebook has a partial solution (Avatarnode) and the HSFS folks are
>> working on a solution (which like Avatarnode mainly involves keeping
>> a hot copy of the Namenode so that failover is "instantaneous" - 1 or 2
>> minutes at most).
>>
>>
>> ----- Original Message -----
>> From: Mark <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Cc:
>> Sent: Saturday, October 29, 2011 11:46 AM
>> Subject: Dealing with single point of failure
>>
>> How does one deal with the fact that HBase has a single point of failure..
>> namely the namenode. What steps can be taken to eliminate and/or minimize
>> the impact of a namenode failure? What can a situation where reliability is
>> of utmost importance should one choose an alternative technology.. ie
>> Cassandra?
>>
>> Thanks
>>
>>
+
Li Pi 2011-10-29, 20:51
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB