Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> [Shadow Regions / Read Replicas ]


Copy link to this message
-
Re: [Shadow Regions / Read Replicas ]
On Tue, Dec 3, 2013 at 2:48 PM, Vladimir Rodionov <[EMAIL PROTECTED]>wrote:

> >MTTR and this work is ortagonal. In a distributed system, you cannot
> >differentiate between
> >a process not responding because it is down or it is busy or network is
> >down, or whatnot. Having
> >a couple of seconds detection time is unrealistic. You will end up in a
> >very unstable state where
> >you will be failing servers all over the place. An external beacon also
> >cannot differentiate between
> >the main process not responding because it is busy, or it is down. What
> >happens why there is a temporary
> >network partition.
>
> Be pro-active, predict node failure (slow requests recently), detect
> possible router/network issues (syslog on each node), temporal network
> partitions are bad,  but they usually affect multiple servers - not just
> one. Pro-activity means that Master can disable RS before RS will go down.
> But ,you are right - its totally orthogonal to what you are proposing here.
I think this is a separate daemon management system.
>

I am just wondering, if FB claim 99.99% of their HBase availability
> (HBaseCon 2013) may be it is worth borrowing some their ideas? How did they
> achieve this?
>
>
Here's the deck http://www.slideshare.net/cloudera/operations-session-2

here's a quick tl;dr
- focus on rack switch failures
- lower timeouts
- improvements in the regionserver (HBASE-6638 in 0.94.2 / HBase-6508 no in
yet).
- locality based stuff (we have a version ported to 0.96 but it only really
works in constrained hbases like Fb's -- it doesn't work with balancing or
splitting at the moment)
- HDFS read from other replica (not in upstream hdfs yet)

Facebook's master is based of the hbase 0.20/0.89 master which is
significantly different than the hbase master from in 0.94/0.96/trunk
today.

Jon.

--
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [EMAIL PROTECTED]