Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> [Shadow Regions / Read Replicas ]


Copy link to this message
-
Re: [Shadow Regions / Read Replicas ]
On Tue, Dec 3, 2013 at 2:48 PM, Vladimir Rodionov <[EMAIL PROTECTED]>wrote:

> >MTTR and this work is ortagonal. In a distributed system, you cannot
> >differentiate between
> >a process not responding because it is down or it is busy or network is
> >down, or whatnot. Having
> >a couple of seconds detection time is unrealistic. You will end up in a
> >very unstable state where
> >you will be failing servers all over the place. An external beacon also
> >cannot differentiate between
> >the main process not responding because it is busy, or it is down. What
> >happens why there is a temporary
> >network partition.
>
> Be pro-active, predict node failure (slow requests recently), detect
> possible router/network issues (syslog on each node), temporal network
> partitions are bad,  but they usually affect multiple servers - not just
> one. Pro-activity means that Master can disable RS before RS will go down.
> But ,you are right - its totally orthogonal to what you are proposing here.
I think this is a separate daemon management system.
>

I am just wondering, if FB claim 99.99% of their HBase availability
> (HBaseCon 2013) may be it is worth borrowing some their ideas? How did they
> achieve this?
>
>
Here's the deck http://www.slideshare.net/cloudera/operations-session-2

here's a quick tl;dr
- focus on rack switch failures
- lower timeouts
- improvements in the regionserver (HBASE-6638 in 0.94.2 / HBase-6508 no in
yet).
- locality based stuff (we have a version ported to 0.96 but it only really
works in constrained hbases like Fb's -- it doesn't work with balancing or
splitting at the moment)
- HDFS read from other replica (not in upstream hdfs yet)

Facebook's master is based of the hbase 0.20/0.89 master which is
significantly different than the hbase master from in 0.94/0.96/trunk
today.

Jon.

--
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB