Yes, we've confirmed this internally too (Santhosh did the work here):

When an agent becomes unreachable while the master is running, it sends

The separate code path for markUnreachableAfterFailover appears to have
been added by this commit:
https://github.com/apache/mesos/commit/937c85f2f6528d1ac56ea9a7aa174ca0bd371d0c

And I think this totally breaks the promise of introducing the
PARTITION_AWARE stuff in a backwards-compatible way.

So right now, yes we rely on reconciliation to finally mark the tasks as
LOST and reschedule their replacements.

I think the only reason we haven't been more impacted by this at Twitter is
our Mesos master is remarkably stable (compared to Aurora's daily
failovers).

We have two paths forward here: push forward and embrace the new partition
awareness features in Aurora and/or push back on the above change with the
Mesos community and have a better story for non-partition aware APIs in the
short term.

On Sat, Jul 15, 2017 at 2:01 AM, Meghdoot bhattacharya <
[EMAIL PROTECTED]lid> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB