A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.
Our worst incident of last year started with a simple question: “Why is there an EC2 instance we can't find in Terraform?”
```bash terraform plan -detailed-exitcode || echo "Drift detected" ```
Drift still happens, but on-call no longer learns about it at the worst possible moment.
Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.
Incident Response for Platform Teams. Practical guidance for reliable, scalable platform operations.
Explore more articles in this category
A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.
Write Ansible playbooks that are idempotent, readable, and maintainable for config management.
A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.