Practical patterns for Terraform modules at scale: versioning, composition, testing, and avoiding the monolith trap.
After managing infrastructure for 50+ microservices with Terraform, we've learned which module patterns scale and which become nightmares. Here's what works.
Our first approach was one massive Terraform repo with everything in it. Plan took 12 minutes. A typo in a dev variable once triggered a production change. We split it up.
We organize modules in three layers:
modules/
base/ # VPC, subnets, DNS zones
platform/ # EKS cluster, RDS, ElastiCache
service/ # Per-service: ALB, task def, IAM role
Each layer depends only on the layer below via remote state data sources:
data "terraform_remote_state" "platform" {
backend = "s3"
config = {
bucket = "terraform-state-prod"
key = "platform/terraform.tfstate"
region = "us-east-1"
}
}
resource "aws_lb_target_group" "service" {
vpc_id = data.terraform_remote_state.platform.outputs.vpc_id
# ...
}
We publish reusable modules to a private registry with semantic versioning:
module "service" {
source = "app.terraform.io/ourorg/service/aws"
version = "~> 2.0"
name = "payment-api"
environment = "production"
cpu = 512
memory = 1024
}
Rules we follow:
~> 2.0), not exactInstead of one module with 40 variables and 15 conditional blocks, we compose small modules:
module "alb" {
source = "./modules/alb"
# ...
}
module "ecs_service" {
source = "./modules/ecs-service"
target_group_arn = module.alb.target_group_arn
# ...
}
module "monitoring" {
source = "./modules/cloudwatch-alarms"
service_name = module.ecs_service.name
# ...
}
Each module does one thing. Connecting them is explicit, not hidden behind flags.
We test modules with terraform validate, tflint, and integration tests:
# In CI pipeline
cd modules/service
terraform init -backend=false
terraform validate
tflint --init
tflint
# Integration test (creates real resources, then destroys)
cd tests/
go test -v -timeout 30m ./...
Terraform at scale is a software engineering problem, not just an infrastructure problem. Treat your modules like libraries.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
Step-by-step debugging of a production Linux server hitting 100% CPU. From top to perf to the actual fix.
How to write postmortems that lead to real improvements, not just documentation theater. Includes a template and real examples.
Explore more articles in this category
How we went from 200 alerts per week (most ignored) to 15 actionable alerts with clear runbooks and useful dashboards.
A real-world Terraform module version pinning guide for platform teams that want safer upgrades, clearer ownership, and fewer broken pipelines after shared module releases.
A practical Terraform state isolation guide built from a real environment-mixing incident, with patterns for safer backends, clearer ownership, and lower blast radius.