Consider a use case where a ECS task could not be placed or started on a cluster and fails silently.
The failure could be due to
- Insufficient resources
- ECS agent crash or not responding
In a production system this failure has to be alerted and we have a way to achieve this, using Cloudwatch to monitor ECS service action events and alerts us when it finds a match.
Lets see how to use it
Capture ECS event
Before starting with alert we need to capture a sample event of the ECS failure.
In our case its task not started event/message which would be as below
This Json sample can then be used to setup cloudwatch alert to trigger a lambda or SNS notification.
Setting up alert
With JSON in place to capture the events we can direclty jump to cloudwatch rule and setup a new one to alert us.
The way to setup simple is easy as shown below if you already have a lambda or SNS setup to send a mail or take an action.
Other ECS events
Example shown here is just one of the many cases you come across while working with ECS,for additional events and customization please go through the official documentation here