Archive
Browse all 536 articles organized by date
2026
88 articlesJanuary
- 31How We Stopped Terraform Drift from Surprising On-Call
- 30Systemd Tricks We Use to Keep Services Boring
- 29Disaster Recovery Planning: Building Resilient Infrastructure
- 28Operational Checklist: Blue-Green Deployment Guardrails
- 27A Pragmatic Multi-Region Strategy for Small Teams
- 26What We Learned Running Weekly Game Days on Our CI/CD Pipeline
- 25Infrastructure Monitoring: Observability for IaC
- 24Operational Checklist: Infrastructure Drift Detection Workflow
- 23FinOps and Cloud Cost Management for Engineering Teams
- 22Ansible Playbook Optimization: Writing Efficient Playbooks
- 21Real-World RAG Incidents: Lessons from a Production Rollout
- 20Operational Checklist: Multi-Cluster Traffic Routing Strategies
- 19How We Stopped Terraform Drift from Surprising On-Call
- 18Pulumi vs Terraform Deep Dive: Choosing the Right IaC Tool
- 17Systemd Tricks We Use to Keep Services Boring
- 16A Pragmatic Multi-Region Strategy for Small Teams
- 15Operational Checklist: Kubernetes Secrets and External Vault Integration
- 14Infrastructure Testing Strategies: Validating Your IaC
- 13What We Learned Running Weekly Game Days on Our CI/CD Pipeline
- 12Operational Checklist: Python Worker Queue Scaling Patterns
- 11Terraform Modules Best Practices: Building Reusable Infrastructure
- 10Real-World RAG Incidents: Lessons from a Production Rollout
- 9How We Stopped Terraform Drift from Surprising On-Call
- 8Operational Checklist: Model Serving Observability Stack
- 7Linux Container Internals: Understanding How Containers Work
- 6Systemd Tricks We Use to Keep Services Boring
- 5A Pragmatic Multi-Region Strategy for Small Teams
- 4Shell Scripting Best Practices: Writing Maintainable Scripts
- 4Operational Checklist: RAG Retrieval Quality Evaluation
- 3Prompt Engineering for DevOps: Consistency and Safety
- 2What We Learned Running Weekly Game Days on Our CI/CD Pipeline
- 1Real-World RAG Incidents: Lessons from a Production Rollout
February
- 28End-of-Week Engineering: Why Smart Tech Teams Don’t Ship Major Changes on Friday
- 27Kubernetes Cost Optimization for Teams: FinOps Tactics That Actually Work
- 26SRE Error Budgets in Practice: Shipping Fast Without Burning Reliability
- 25Platform Engineering with Backstage: Build a Useful Developer Portal
- 24GitHub Actions for Monorepos: Fast CI Without Pipeline Chaos
- 23Azure DevOps Best Practices in 2026: Build Pipelines You Can Trust
- 22AI Best Practices in 2026: Shipping Reliable Systems, Not Demo Magic
- 21
March
- 27Model Fallback Policies for Customer-Facing AI: The Routing Rules That Kept SLA Intact
- 26Artifact Promotion Instead of Rebuilds: The Release Control Pattern That Stopped Drift
- 25RDS Restore Drills for Busy Teams: The Recovery Workflow That Surfaced Real Gaps
- 24Systemd Drop-In Overrides for Vendor Services: The Supportable Linux Ops Pattern
- 23Terraform Module Version Pinning: How One Platform Team Stopped Surprise Breakage
- 22Embedding Model Upgrades Without Search Chaos: A Safer RAG Rollout Pattern
- 21
2025
339 articlesJanuary
- 29Troubleshooting: Kubernetes Cluster Upgrade Strategy
- 26Field Notes: AI Inference Cost Optimization
- 22Field Notes: SLO-Based Monitoring for APIs
- 18Field Notes: Secure Container Supply Chain Controls
- 14Field Notes: Infrastructure Documentation as Code
- 9Field Notes: Cloud Networking Segmentation Patterns
- 6Field Notes: Incident Response for Platform Teams
2024
105 articlesJanuary
- 28Practical Guide: Cloud Disaster Recovery Runbook Design
- 24Practical Guide: AWS Cost Control with Tagging and Budgets
- 21Practical Guide: Ansible Role Design for Large Teams
- 17Practical Guide: Terraform State Isolation by Environment
- 15Orchestrating AI Agents on Kubernetes
- 13Practical Guide: GitHub Actions Pipeline Reliability
- 10eBPF: The Future of Kernel Observability