Thanks for the detailed design document and the in-depth walkthrough !
Your proposal seems to be sound. (But be warned, I don’t have much experience in this part of Aurora or Mesos :-))
On 31.08.17, 04:18, "Jordan Ly" <[EMAIL PROTECTED]> wrote:
Following up on the discussion here: https://lists.apache.org/thread.html/e31d7dbcb054ed570f969ae2043eadfc090383edfe0751cec59b29d3@%3Cdev.aurora.apache.org%3E
I've created a design document detailing the implementation of a "hot
standby" mechanism where scheduler followers would eagerly read and
apply entries from the replicated log. The goal of this change is
that, in the event of a failover, the newly elected follower will not
have to replay as many entries to rebuild its state and thus can start
serving traffic faster. https://docs.google.com/document/d/1DOtKA4-vrtxat1MaUYMQ6Y1iXhA8ob6Mfztzt-R1Oss/edit?usp=sharing
I have a working prototype of the above design running on a test
cluster. Please feel free to comment on the doc!
This document references a current proposal in Mesos by Ilya Pronin