Yes, we've confirmed this internally too (Santhosh did the work here):

When an agent becomes unreachable while the master is running, it sends

The separate code path for markUnreachableAfterFailover appears to have
been added by this commit:

And I think this totally breaks the promise of introducing the
PARTITION_AWARE stuff in a backwards-compatible way.

So right now, yes we rely on reconciliation to finally mark the tasks as
LOST and reschedule their replacements.

I think the only reason we haven't been more impacted by this at Twitter is
our Mesos master is remarkably stable (compared to Aurora's daily

We have two paths forward here: push forward and embrace the new partition
awareness features in Aurora and/or push back on the above change with the
Mesos community and have a better story for non-partition aware APIs in the
short term.

On Sat, Jul 15, 2017 at 2:01 AM, Meghdoot bhattacharya <
[EMAIL PROTECTED]lid> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB