We are currently using Aurora 0.17.0 and have a use-case wherein we want to continuously monitor the below SLA metrics for our clusters to detect any anomalies :
* Median Time To Assigned (MTTA<http://aurora.apache.org/documentation/latest/features/sla-metrics/#median-time-to-assigned-(mtta)>
* Median Time To Starting (MTTS<http://aurora.apache.org/documentation/latest/features/sla-metrics/#median-time-to-starting-(mtts)>
* Median Time To Running (MTTR<http://aurora.apache.org/documentation/latest/features/sla-metrics/#median-time-to-running-(mttr)>
Currently, the sla_stat_refresh_interval for us is set to default 1 min.
Now, while using the /vars api endpoint to fetch the SLA metrics, aurora samples the data for metrics calculation of the above metrics only for the last one min at every 1 minute interval. It won’t give us the historical data for these metrics.
Does aurora expose any api endpoint to provide the historical data for these metrics over some configurable period of time? Is there any metric in /graphview endpoint for this?
Also, it will be great if anyone can suggest some ideas for monitoring around these metrics. I am , at present, planning to keep polling the /vars endpoint regularly for data collection and use ELK stack for graphing and alerting.
Thanks for your time in advance !!