We are using 0.94.6.1 version of HBase and are planning for high availability testing. While the entire scheme to enable the cluster to be highly available is clear, I wanted to get some idea about HBase Service lifetime in terms of Mean-Time to Failure and Time of Recovery in cases of failure. Any historic evidences will also help, as it will be vital for us to calculate the actual availability of the system across an year.
While I understand that HBase provides more of active/passive mode of seamless high availability of Masters, but any failure, will impact the performance to some extent and this calculation will help in deriving the actual number of nodes that we should consider without compromising on the performance as well, while the system is available.
The article below from HBase wiki indicates that the Master switching takes a couple of seconds to happen but I think the volume of data, replay logs and the region availability will also play a key role in order to make the switch complete, and hence would request guidance around the complete mechanism and recovery time.
Any ideas/facts would be very helpful .
Thanks & Regards
NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.