-Re: [hbase-5487] Master design discussion.
Sergey Shelukhin 2013-10-22, 21:40
Sorry, I was busy, let me respond to Jonathan's comment now.
Lars: both mine and Jonathan's spec try to address it as one of the main
Both specs are similar actually.
On Sat, Oct 19, 2013 at 3:10 PM, Lars Hofhansl <[EMAIL PROTECTED]> wrote:
> Didn't read the spec, yet.
> My main gripe currently is that there are too many places holding the
> state: the fs, .meta., zk, copies in ram at both master and region servers,
> etc. Some of the problems we've seen were due to the various copy of this
> state getting out of sync.
> Now I'll shut up and read the speed.
> -- Lars
> Jonathan Hsieh <[EMAIL PROTECTED]> wrote:
> >Responses inline. I'm going to wait for next doc before I ask more
> >questions about the master5 specifics. :)
> >I suggest we start a thread about the hang/double assign/fencing scenario.
> >On Sat, Oct 19, 2013 at 7:17 AM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:
> >> Here's sergey's replies from jira (modulo some reformatting for email.)
> >> ----
> >> Answers lifted from email also (some fixes + one answer was modified due
> >> to clarification here ).
> >> What is a failure and how do you react to failures? I think the master5
> >>> design needs to spend more effort to considering failure and recovery
> >>> cases. I claim there are 4 types of responses from a networked IO
> >>> - two states we normally deal with ack successful, ack failed (nack)
> >>> unknown due to timeout that succeeded (timeout success) and unknown
> due to
> >>> timeout that failed (timeout failed). We have historically missed the
> >>> two cases and they aren't considered in the master5 design.
> >> There are a few considerations. Let me examine if there are other cases
> >> than these.
> >> I am assuming the collocated table, which should reduce such cases for
> >> state (probably, if collocated table cannot be written reliably, master
> >> must stop-the-world and fail over).
> >For the master, I agree. In the case of nack it knows it failed and
> >failover, in the case of timeouts it should try verify and retry and if it
> >cannot it should abdicate. This seems equivalent to the 9x-master's "if
> >can't get to zk we abort" behavior. We need to guarantee that the old
> >master and new master are fenced off from each other to make sure they
> >don't split brain.
> >Off list, you suggested wal fencing and hbck-master sketched out a lease
> >timeouts mechanism that provides isolation. Correctness in the face of
> >hangs and partitions are my biggest concern, and a lot of the questions
> >asking focus on these scenarios.
> >Can you point me to where I read and consider the wal fencing? Do you
> >thoughts or a comparison of these approaches? Are there others
> >> When RS contacts master to do state update, it errs on the side of
> >> - no state update, no open region (or split).
> >not sure what "it" in the second part of the sentence means here -- is
> >the master or the rs? If it is the master I this is what hbck-master
> >an "actual state update".
> >If "it" mean RS's, I'm confused.
> >> Thus, except for the case of multiple masters running, we can always
> >> assume RS didn't online the region if we don't know about it.
> >I don't think that is a safe assumption -- I still think double assignment
> >can and will happen. If an RS hangs long enough for the master to think
> >is dead, the master would reassign. If that RS came back, it could still
> >be open handling clients who cached the old location.
> > Then, for messages to RS, see "Note on messages"; they are idempotent so
> >> they can always be resent.
> >> 1) State update coordination. What is a "state updates from the outside"
> >>> Do RS's initiate splitting on their own? Maybe a picture would help so
> >>> can figure out if it is similar or different from hbck-master's?
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.