-Re: Hbase Assignments in trunk.
Jonathan Hsieh 2012-09-06, 10:16
On Wed, Sep 5, 2012 at 4:08 PM, Stack <[EMAIL PROTECTED]> wrote:
> On Wed, Sep 5, 2012 at 12:38 PM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:
> > We've also talked about defining design and code invariants -- here's the
> > one that I've gotten so far: (We can pull up more from discussion)
> > * ZK state should transient (treat it like memory). If deleted, hbase
> > be able to recover and essentially be in the same state (a few
> exceptions --
> > enabled/disable state)
> We should post these invariants somewhere? In dev section of refguide?
> We should definitely put this in the javadoc. Maybe we should have a
dev-guide section of the ref-guide where some of these things are also
> > 4) Why are there multiple error conventions -- abort, FAILED_OPEN,
> > exception, (and cases where we "return" silently without notification)?
> I would have to look at the particular instance but high level I'd say
> its a case of:
> 1. On the one hand your classic myopic patch-centric view
> 2. While on the other, you can't throw an exception out to the master
> if the rpc open has been successfully handed off and the rpc has
> completed... there needs to be another means flagging error.
> On a code craft point of view, failure behavior is buried deeply and could
be pulled out to the process methods of the handlers. In many cases, it
isn't easy to figure out why one behavior is chosen vs others.
> > 5) How do we handle timeout situations -- IMO it makes sense to have a
> > rollback or fail forward policy for different places on the timeline.
> Yes. There are a couple of flavors of this in the code base at
> present. Could do w/ a revisit for sure.
> This is more a question -- I'm not familiar with the details of rpc
> > 6) Can we use cancellation instead of checking for
> > enabling/disabled/disabling/shutdown/stopping all over the place? (let's
> > these cluster ops would cancel the assign and then win by blocking
> The enabling, etc., checks are done on assign to make sure we don't go
> ahead if table state has changed since the order to assign was given.
> To me cancel seems like something else; the open or close has gone out
> already and we want to stop it happening.
> They seem like different things to me.
> I'm suggesting that when a overriding operation like
enable/disable/shutdown/stop is triggered we internally use cancellation to
preemmpt assignments/unassignments. This could be in the same places where
we currently do the checks, but also eventually be used to cancel
open/close operations. Maybe this is too far out for the time being.
> > 7) In memory state has different but similarly named states in the HM,
> > and in the RS's. And there are the transition events could be missed.
> Yes. This is a problem.
> My peeve is the one where we cannot trust what RegionState says and
> even if we could, its states are not 'clean'; e.g. OFFINE is both
> BEGIN the open of a region but also a catchall parking state that we
> put regions into when not sure what else to do w/ them.
There is the state name (i agree). Also, there is the fact that
RegionState is not always right (possibly more than one state transition
behind). This is actually why I was considering taking the zk-based
control flow elements and putting them in the master. If states are
skipped we need to make sure the transitions happen on the master (or we
can safely skip the transition).
I'm also suggesting that we could avoid using ZK event callbacks like the
OPENING and OPENED zk transition and instead have the master would manage
those. We'd have an opening RS would tickle some other znode to show
progress. At least then RegionState would be closer to accurate, and the
HM would go through all state transitions.
> > 8) Is having multiple processes "responsible for acting" necessary? (why
> > not have the HM open and then update meta)?
I'm pretty sure it would have more latency. Controlling when the becomes a
assigned region availabile might make this trickier. (Jimmy caught a bug
in an earlier version of this).
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [EMAIL PROTECTED]