Hi,

AFAIK the absence of TASK_LOST statuses is expected. Master registry
persists information only about agents. Tasks are recovered from
re-registering agents. Because of that the failed over master can't send
TASK_LOST for tasks that were running on the agent that didn't re-register,
it simply doesn't know about them. The only thing the master can do in this
situation is send LostSlaveMessage that will tell the scheduler that tasks
on this agent are LOST/UNREACHABLE.

The situation where the agent came back after reregistration timeout
doesn't sound good. The only way for the framework to learn about tasks
that are still running on such agent is either from status updates or via
implicit reconciliation. Perhaps, the master could send updates for tasks
it learned about when such agent is readmitted?

On Sun, Jul 16, 2017 at 5:54 AM, Meghdoot bhattacharya <
[EMAIL PROTECTED]lid> wrote:
--
Ilya Pronin
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB