-Re: high availability
Bertrand Dechoux 2013-10-16, 05:16
Old version (4.1) but the principle is still the same.
*No requirement for custom fencing configuration *- fencing methods such as
STONITH <http://en.wikipedia.org/wiki/STONITH> require custom hardware;
instead, we should rely only on software methods.
PS: But then the only true validation is by testing it.
On Tue, Oct 15, 2013 at 10:59 PM, Jing Zhao <[EMAIL PROTECTED]> wrote:
> I think a real fencing is not required in case that you're using
> QJM-based HA. If you are using ZKFC, a graceful fencing will first be
> triggered in which ZKFC will send a RPC request to the original ANN to
> make it standby. If the graceful fencing failed the configured fencing
> will be used. In the worst case that your original ANN cannot
> transition to standby state, QJM still has built-in single-writer
> semantics (see https://issues.apache.org/jira/browse/HDFS-3862,
> https://issues.apache.org/jira/browse/HDFS-4915). Thus you can set the
> fence method to shell(/bin/true) (since in the current code the fence
> configuration is still required).
> On Tue, Oct 15, 2013 at 12:11 PM, Koert Kuipers <[EMAIL PROTECTED]> wrote:
> > Jing,
> > thanks for your answer.
> > if hbase with high availability is the desired goal, is it recommended to
> > remove sshfence? we do not plan to use hdfs for anything else.
> > i understood that the only downside of no fencing is that the old
> > could still be serving read requests. could this negatively impact hbase
> > functionality, or worse, could it corrupt hbase somehow (not sure how
> > would be...)?
> > thanks! koert
> > On Tue, Oct 15, 2013 at 12:38 AM, Jing Zhao <[EMAIL PROTECTED]>
> >> "it is unclear to me if the transition in this case is also rapid but
> >> the fencing takes long while the new namenode is already active, or if
> >> in this period i am stuck without an active namenode."
> >> The standby->active transition will get stuck in this period, i.e.,
> >> the NN can only become active after fencing the old active NN. During
> >> this period since the only NN is in standby state which cannot handle
> >> usual R/W operations and just throws StandbyException, hbase region
> >> server may kill itself in some cases I guess.
> >> I think you can remove sshfence from the configuration if you are
> >> using QJM-based HA.
> >> On Fri, Oct 11, 2013 at 4:51 PM, Koert Kuipers <[EMAIL PROTECTED]>
> >> > i have been playing with high availability using journalnodes and 2
> >> > masters
> >> > both running namenode and hbase master.
> >> >
> >> > when i kill the namenode and hbase-master processes on the active
> >> > master,
> >> > the failover is perfect. hbase never stops and a running map-reduce
> >> > keeps going. this is impressive!
> >> >
> >> > however when instead of killing the proceses i kill the entire active
> >> > master
> >> > machine, the transactions is less smooth and can take a long time, at
> >> > least
> >> > it seems this way in the logs. this is because ssh fencing fails but
> >> > keeps
> >> > trying. my fencing is configured as:
> >> >
> >> > <property>
> >> > <name>dfs.ha.fencing.methods</name>
> >> > <value>
> >> > sshfence
> >> > shell(/bin/true)
> >> > </value>
> >> > <final>true</final>
> >> > </property>
> >> >
> >> > it is unclear to me if the transition in this case is also rapid but
> >> > fencing takes long while the new namenode is already active, or if in
> >> > this
> >> > period i am stuck without an active namenode. it is hard to accurately
> >> > test
> >> > this in my setup.
> >> > is this supposed to take this long? is HDFS writable in this period?
> >> > is
> >> > hbase supposed to survive this long transition?
> >> >
> >> > thanks! koert
> >> --
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or entity