A real cost audit uncovered idle load balancers, oversized RDS instances, and forgotten snapshots. Here's what we found and how we fixed each one.
After our AWS bill crossed $18,000/month for a 15-person startup, we did a proper audit. We found $6,200 in monthly waste. Here's every item.
Three ALBs were still running from decommissioned staging environments. Each costs ~$16/month base plus LCU charges.
Fix: We added a Terraform lifecycle check that tags ALBs with the owning team and a TTL. A weekly Lambda deletes anything past its TTL with zero healthy targets.
Our production database was on db.r6g.2xlarge. CloudWatch showed average CPU at 12% and memory at 35%.
Fix: Downgraded to db.r6g.large during a maintenance window. Set up a CloudWatch alarm for CPU > 70% so we'll know when to scale back up.
14 EBS volumes were sitting with status "available"—leftovers from terminated EC2 instances.
Fix: Scripted a check:
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[*].{ID:VolumeId,Size:Size,Created:CreateTime}' \
--output table
Snapshot anything older than 30 days, then delete.
We had 2,400 EBS snapshots going back 3 years. Most were from AMIs we no longer use.
Fix: Implemented AWS Data Lifecycle Manager with a 90-day retention policy.
Our NAT Gateway was processing 800GB/month. Much of it was S3 traffic from private subnets.
Fix: Added a VPC Gateway Endpoint for S3. Free, and it cut NAT traffic by 60%.
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.us-east-1.s3"
route_table_ids = [aws_route_table.private.id]
}
Every Lambda was set to 1024MB by default. AWS Power Tuning showed most needed 256MB.
Fix: Ran Power Tuning on our top 10 functions and right-sized them.
We were paying on-demand for 4 EC2 instances that had been running for 2 years.
Fix: Purchased 1-year no-upfront reserved instances for predictable workloads.
The $6,200/month we saved required about 8 hours of work. That's an annualized return of $74,400 for one day of effort.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
A real walkthrough of shrinking bloated Docker images from 1.2GB to 240MB using multi-stage builds, Alpine, and dependency auditing.
Battle-tested prompt patterns from running LLM features in production: structured output, chain-of-thought, and graceful failure handling.
Explore more articles in this category
How we migrated from .env files checked into repos to a proper secrets management workflow with HashiCorp Vault and CI/CD integration.
A hands-on RDS restore drill guide for small cloud teams that thought backups were covered until a timed restore test exposed missing steps, DNS confusion, and stale credentials.
A real-world multi-cluster traffic routing guide for SaaS teams that have outgrown a single Kubernetes cluster and need safer rollout control without a service-mesh science project.