Skip to main content

Reading List About

Archive

Browse all 553 articles organized by date

2026

105 articles

January

31How We Stopped Terraform Drift from Surprising On-Call
30Systemd Tricks We Use to Keep Services Boring
29Disaster Recovery Planning: Building Resilient Infrastructure
28Operational Checklist: Blue-Green Deployment Guardrails
27A Pragmatic Multi-Region Strategy for Small Teams
26What We Learned Running Weekly Game Days on Our CI/CD Pipeline
25Infrastructure Monitoring: Observability for IaC
24Operational Checklist: Infrastructure Drift Detection Workflow
23FinOps and Cloud Cost Management for Engineering Teams
22Ansible Playbook Optimization: Writing Efficient Playbooks
21Real-World RAG Incidents: Lessons from a Production Rollout
20Operational Checklist: Multi-Cluster Traffic Routing Strategies
19How We Stopped Terraform Drift from Surprising On-Call
18Pulumi vs Terraform Deep Dive: Choosing the Right IaC Tool
17Systemd Tricks We Use to Keep Services Boring
16A Pragmatic Multi-Region Strategy for Small Teams
15Operational Checklist: Kubernetes Secrets and External Vault Integration
14Infrastructure Testing Strategies: Validating Your IaC
13What We Learned Running Weekly Game Days on Our CI/CD Pipeline
12Operational Checklist: Python Worker Queue Scaling Patterns
11Terraform Modules Best Practices: Building Reusable Infrastructure
10Real-World RAG Incidents: Lessons from a Production Rollout
9How We Stopped Terraform Drift from Surprising On-Call
8Operational Checklist: Model Serving Observability Stack
7Linux Container Internals: Understanding How Containers Work
6Systemd Tricks We Use to Keep Services Boring
5A Pragmatic Multi-Region Strategy for Small Teams
4Shell Scripting Best Practices: Writing Maintainable Scripts
4Operational Checklist: RAG Retrieval Quality Evaluation
3Prompt Engineering for DevOps: Consistency and Safety
2What We Learned Running Weekly Game Days on Our CI/CD Pipeline
1Real-World RAG Incidents: Lessons from a Production Rollout

February

28End-of-Week Engineering: Why Smart Tech Teams Don’t Ship Major Changes on Friday
27Kubernetes Cost Optimization for Teams: FinOps Tactics That Actually Work
26SRE Error Budgets in Practice: Shipping Fast Without Burning Reliability
25Platform Engineering with Backstage: Build a Useful Developer Portal
24GitHub Actions for Monorepos: Fast CI Without Pipeline Chaos
23Azure DevOps Best Practices in 2026: Build Pipelines You Can Trust
22AI Best Practices in 2026: Shipping Reliable Systems, Not Demo Magic
21

March

31Linux Performance Troubleshooting: A Real Incident Walkthrough
30Prompt Engineering Patterns That Actually Work in Production
29AWS Cost Audit: 7 Things We Found Wasting Money Every Month
28How We Cut Our Docker Image Size by 80% and Why It Matters
27Model Fallback Policies for Customer-Facing AI: The Routing Rules That Kept SLA Intact
26Artifact Promotion Instead of Rebuilds: The Release Control Pattern That Stopped Drift
25RDS Restore Drills for Busy Teams: The Recovery Workflow That Surfaced Real Gaps

April

13Pre-Commit Hooks That Saved Our Repo: 7 Real Examples
12EKS Auto Mode: What Worked, What Broke in Our Migration
11Self-Hosted LLMs vs OpenAI API: A Cost-vs-Latency Analysis After 6 Months
10OpenTelemetry Collector Pipelines: Real Configs That Survived Production
9Blue/Green Deploys for Stateful Services: A Postgres Cutover Story
8systemd Timers vs Cron: When We Switched and What We Learned
7Zero Trust on AWS: Lessons From Implementing IAM Identity Center

2025

339 articles

January

29Troubleshooting: Kubernetes Cluster Upgrade Strategy
26Field Notes: AI Inference Cost Optimization
22Field Notes: SLO-Based Monitoring for APIs
18Field Notes: Secure Container Supply Chain Controls
14Field Notes: Infrastructure Documentation as Code
9Field Notes: Cloud Networking Segmentation Patterns
6Field Notes: Incident Response for Platform Teams

2024

105 articles

January

28Practical Guide: Cloud Disaster Recovery Runbook Design
24Practical Guide: AWS Cost Control with Tagging and Budgets
21Practical Guide: Ansible Role Design for Large Teams
17Practical Guide: Terraform State Isolation by Environment
15Orchestrating AI Agents on Kubernetes
13Practical Guide: GitHub Actions Pipeline Reliability
10eBPF: The Future of Kernel Observability

2023

4 articles

December

28AWS Cost Optimization Strategies
25Advanced Bash Scripting Techniques
20Docker Multi-Stage Builds for Production
15Infrastructure as Code with Ansible

AI Best Practices for Engineering Teams: From Prompt Experiments to Platform Discipline

20Operational Checklist: AI Inference Cost Optimization

19What We Learned Running Weekly Game Days on Our CI/CD Pipeline

18Real-World RAG Incidents: Lessons from a Production Rollout

17How We Stopped Terraform Drift from Surprising On-Call

16Operational Checklist: SLO-Based Monitoring for APIs

15Systemd Tricks We Use to Keep Services Boring

14A Pragmatic Multi-Region Strategy for Small Teams

13Kubernetes Networking: Services, Ingress, and Network Policies

12Operational Checklist: Secure Container Supply Chain Controls

11What We Learned Running Weekly Game Days on Our CI/CD Pipeline

10Real-World RAG Incidents: Lessons from a Production Rollout

9How We Stopped Terraform Drift from Surprising On-Call

8Operational Checklist: Infrastructure Documentation as Code

7Systemd Tricks We Use to Keep Services Boring

6A Pragmatic Multi-Region Strategy for Small Teams

5Infrastructure Cost Optimization: Reducing Cloud Spending

4Operational Checklist: Cloud Networking Segmentation Patterns

3What We Learned Running Weekly Game Days on Our CI/CD Pipeline

2Real-World RAG Incidents: Lessons from a Production Rollout

1Multi-Cloud Infrastructure: Managing Resources Across Providers

1Operational Checklist: Incident Response for Platform Teams

24

Systemd Drop-In Overrides for Vendor Services: The Supportable Linux Ops Pattern

23Terraform Module Version Pinning: How One Platform Team Stopped Surprise Breakage

22Embedding Model Upgrades Without Search Chaos: A Safer RAG Rollout Pattern

21Multi-Cluster Traffic Routing Strategies: A Pragmatic Rollout Pattern for Growing SaaS Teams

20Terraform State Isolation by Environment: How We Stopped One Change from Hitting Prod

19Prompt Versioning and Regression Testing: How Teams Avoid Silent AI Regressions

18Systemd Service Reliability Patterns: What We Changed After Repeated Restart Loops

17Blue-Green Deployment Guardrails in Kubernetes: Lessons from a Failed Friday Rollout

16Cloud Disaster Recovery Runbook Design: How Small Teams Rehearse Multi-Region Failover

15RAG Retrieval Quality Evaluation: The Checks We Added After Bad Answers Reached Production

14Infrastructure Documentation as Code: How One Platform Team Reduced Audit Fire Drills

13Linux Patch Management for Production Fleets: A Real-World Maintenance Workflow

12AWS Cost Allocation Tags for Shared Platforms: What Finally Worked

11GitHub Actions Monorepo CI: How We Cut Build Times Without Breaking Main

10Real-World RAG Incidents: Lessons from a Production Rollout

9How We Stopped Terraform Drift from Surprising On-Call

8Systemd Tricks We Use to Keep Services Boring

7A Pragmatic Multi-Region Strategy for Small Teams

6What We Learned Running Weekly Game Days on Our CI/CD Pipeline

5Ansible and Infrastructure as Code: Idempotency and Best Practices

4Real-World RAG Incidents: Lessons from a Production Rollout

3How We Stopped Terraform Drift from Surprising On-Call

2Systemd Tricks We Use to Keep Services Boring

1A Pragmatic Multi-Region Strategy for Small Teams

6

Embedding Quality in RAG: How We Cut Hallucinations by 60%

5Database Migrations Without Downtime: Patterns From Three Real Cutovers

4Monitoring That Actually Helps On-Call: Alerts, Dashboards, and Runbooks

3Secrets Management in Practice: From .env Files to Vault

2Incident Postmortems That Actually Prevent Repeat Failures

1Terraform Modules Done Right: Lessons from Managing 50+ Services

2

Field Notes: Blue-Green Deployment Guardrails

February

28AI Agents in DevOps: From Copilots to Autonomous Automation in 2025
26Troubleshooting: Linux Performance Baseline Methodology
22Troubleshooting: Cloud Disaster Recovery Runbook Design
18Troubleshooting: AWS Cost Control with Tagging and Budgets
15Troubleshooting: Ansible Role Design for Large Teams
10Troubleshooting: Terraform State Isolation by Environment
6Troubleshooting: GitHub Actions Pipeline Reliability
2Troubleshooting: Docker Image Hardening for Production

March

31A Pragmatic Multi-Region Strategy for Small Teams
30What We Learned Running Weekly Game Days on Our CI/CD Pipeline
29Troubleshooting: Kubernetes Secrets and External Vault Integration
28Real-World RAG Incidents: Lessons from a Production Rollout
27How We Stopped Terraform Drift from Surprising On-Call
26Troubleshooting: Python Worker Queue Scaling Patterns
25Systemd Tricks We Use to Keep Services Boring
24A Pragmatic Multi-Region Strategy for Small Teams
23What We Learned Running Weekly Game Days on Our CI/CD Pipeline
22Real-World RAG Incidents: Lessons from a Production Rollout
21Troubleshooting: Model Serving Observability Stack
21Platform Engineering and Internal Developer Platforms in 2025
20How We Stopped Terraform Drift from Surprising On-Call
19Systemd Tricks We Use to Keep Services Boring
18A Pragmatic Multi-Region Strategy for Small Teams
17Troubleshooting: RAG Retrieval Quality Evaluation
16What We Learned Running Weekly Game Days on Our CI/CD Pipeline
15Real-World RAG Incidents: Lessons from a Production Rollout
14How We Stopped Terraform Drift from Surprising On-Call
13Troubleshooting: Prompt Versioning and Regression Testing
12Systemd Tricks We Use to Keep Services Boring
11A Pragmatic Multi-Region Strategy for Small Teams
10What We Learned Running Weekly Game Days on Our CI/CD Pipeline
9Troubleshooting: LLM Gateway Design for Multi-Provider Inference
8Real-World RAG Incidents: Lessons from a Production Rollout
7How We Stopped Terraform Drift from Surprising On-Call
6Troubleshooting: Kernel and Package Patch Management
5Systemd Tricks We Use to Keep Services Boring
4A Pragmatic Multi-Region Strategy for Small Teams
3What We Learned Running Weekly Game Days on Our CI/CD Pipeline
2Troubleshooting: Systemd Service Reliability Patterns
1Real-World RAG Incidents: Lessons from a Production Rollout

April

30How We Stopped Terraform Drift from Surprising On-Call
29Troubleshooting: SLO-Based Monitoring for APIs
28Systemd Tricks We Use to Keep Services Boring
27A Pragmatic Multi-Region Strategy for Small Teams
26What We Learned Running Weekly Game Days on Our CI/CD Pipeline
25Troubleshooting: Secure Container Supply Chain Controls
24Real-World RAG Incidents: Lessons from a Production Rollout
23How We Stopped Terraform Drift from Surprising On-Call
22Systemd Tricks We Use to Keep Services Boring
21Troubleshooting: Infrastructure Documentation as Code
20A Pragmatic Multi-Region Strategy for Small Teams
19What We Learned Running Weekly Game Days on Our CI/CD Pipeline
18Real-World RAG Incidents: Lessons from a Production Rollout
17Troubleshooting: Cloud Networking Segmentation Patterns
16How We Stopped Terraform Drift from Surprising On-Call
15Systemd Tricks We Use to Keep Services Boring
14Troubleshooting: Incident Response for Platform Teams
13A Pragmatic Multi-Region Strategy for Small Teams
12What We Learned Running Weekly Game Days on Our CI/CD Pipeline
11Real-World RAG Incidents: Lessons from a Production Rollout
10Troubleshooting: Blue-Green Deployment Guardrails
10Kubernetes Cost Optimization: Rightsizing, Spot, and FinOps
9How We Stopped Terraform Drift from Surprising On-Call
8Systemd Tricks We Use to Keep Services Boring
7A Pragmatic Multi-Region Strategy for Small Teams
6Troubleshooting: Infrastructure Drift Detection Workflow
5What We Learned Running Weekly Game Days on Our CI/CD Pipeline
4Real-World RAG Incidents: Lessons from a Production Rollout
3How We Stopped Terraform Drift from Surprising On-Call
2Troubleshooting: Multi-Cluster Traffic Routing Strategies
1Systemd Tricks We Use to Keep Services Boring

May

31Real-World RAG Incidents: Lessons from a Production Rollout
30Best Practices: Cloud Disaster Recovery Runbook Design
29How We Stopped Terraform Drift from Surprising On-Call
28Systemd Tricks We Use to Keep Services Boring
27A Pragmatic Multi-Region Strategy for Small Teams
26Best Practices: AWS Cost Control with Tagging and Budgets
25What We Learned Running Weekly Game Days on Our CI/CD Pipeline
24Real-World RAG Incidents: Lessons from a Production Rollout
23Best Practices: Ansible Role Design for Large Teams
22How We Stopped Terraform Drift from Surprising On-Call
21Observability with OpenTelemetry: Traces, Metrics, and Logs
20Systemd Tricks We Use to Keep Services Boring
19Best Practices: Terraform State Isolation by Environment
18A Pragmatic Multi-Region Strategy for Small Teams
17What We Learned Running Weekly Game Days on Our CI/CD Pipeline
16Real-World RAG Incidents: Lessons from a Production Rollout
15Best Practices: GitHub Actions Pipeline Reliability
14How We Stopped Terraform Drift from Surprising On-Call
13Systemd Tricks We Use to Keep Services Boring
12A Pragmatic Multi-Region Strategy for Small Teams
11Best Practices: Docker Image Hardening for Production
10What We Learned Running Weekly Game Days on Our CI/CD Pipeline
9Real-World RAG Incidents: Lessons from a Production Rollout
8How We Stopped Terraform Drift from Surprising On-Call
7Best Practices: Kubernetes Cluster Upgrade Strategy
6Systemd Tricks We Use to Keep Services Boring
5A Pragmatic Multi-Region Strategy for Small Teams
4Troubleshooting: AI Inference Cost Optimization
3What We Learned Running Weekly Game Days on Our CI/CD Pipeline
2Real-World RAG Incidents: Lessons from a Production Rollout
1GitOps with Argo CD: Best Practices for 2025

June

30A Pragmatic Multi-Region Strategy for Small Teams
29What We Learned Running Weekly Game Days on Our CI/CD Pipeline
28Real-World RAG Incidents: Lessons from a Production Rollout
27Best Practices: Model Serving Observability Stack
26How We Stopped Terraform Drift from Surprising On-Call
25Systemd Tricks We Use to Keep Services Boring
24A Pragmatic Multi-Region Strategy for Small Teams
23Best Practices: RAG Retrieval Quality Evaluation
22What We Learned Running Weekly Game Days on Our CI/CD Pipeline
21Real-World RAG Incidents: Lessons from a Production Rollout
20How We Stopped Terraform Drift from Surprising On-Call
19Best Practices: Prompt Versioning and Regression Testing
18Systemd Tricks We Use to Keep Services Boring
17A Pragmatic Multi-Region Strategy for Small Teams
16What We Learned Running Weekly Game Days on Our CI/CD Pipeline
15Best Practices: LLM Gateway Design for Multi-Provider Inference
14Real-World RAG Incidents: Lessons from a Production Rollout
13How We Stopped Terraform Drift from Surprising On-Call
12Best Practices: Kernel and Package Patch Management
11Docker Security Best Practices: Images, Runtime, and Supply Chain
10Systemd Tricks We Use to Keep Services Boring
9A Pragmatic Multi-Region Strategy for Small Teams
8Best Practices: Systemd Service Reliability Patterns
7What We Learned Running Weekly Game Days on Our CI/CD Pipeline
6Real-World RAG Incidents: Lessons from a Production Rollout
5How We Stopped Terraform Drift from Surprising On-Call
4Systemd Tricks We Use to Keep Services Boring
3Best Practices: Linux Performance Baseline Methodology
2A Pragmatic Multi-Region Strategy for Small Teams
1What We Learned Running Weekly Game Days on Our CI/CD Pipeline

July

31How We Stopped Terraform Drift from Surprising On-Call
30Systemd Tricks We Use to Keep Services Boring
29A Pragmatic Multi-Region Strategy for Small Teams
28Best Practices: Infrastructure Documentation as Code
27What We Learned Running Weekly Game Days on Our CI/CD Pipeline
26Real-World RAG Incidents: Lessons from a Production Rollout
25How We Stopped Terraform Drift from Surprising On-Call
24Best Practices: Cloud Networking Segmentation Patterns
23Systemd Tricks We Use to Keep Services Boring
22Linux Performance Tuning for Containers and Kubernetes Nodes
21Best Practices: Incident Response for Platform Teams
20A Pragmatic Multi-Region Strategy for Small Teams
19What We Learned Running Weekly Game Days on Our CI/CD Pipeline
18Real-World RAG Incidents: Lessons from a Production Rollout
17Best Practices: Blue-Green Deployment Guardrails
16How We Stopped Terraform Drift from Surprising On-Call
15Systemd Tricks We Use to Keep Services Boring
14A Pragmatic Multi-Region Strategy for Small Teams
13What We Learned Running Weekly Game Days on Our CI/CD Pipeline
12Best Practices: Infrastructure Drift Detection Workflow
11Real-World RAG Incidents: Lessons from a Production Rollout
10How We Stopped Terraform Drift from Surprising On-Call
9Systemd Tricks We Use to Keep Services Boring
8Best Practices: Multi-Cluster Traffic Routing Strategies
7A Pragmatic Multi-Region Strategy for Small Teams
6What We Learned Running Weekly Game Days on Our CI/CD Pipeline
5Real-World RAG Incidents: Lessons from a Production Rollout
4Best Practices: Kubernetes Secrets and External Vault Integration
3How We Stopped Terraform Drift from Surprising On-Call
2Systemd Tricks We Use to Keep Services Boring
1Terraform Cloud Cost Controls: Budgets, Policies, and Tagging
1Best Practices: Python Worker Queue Scaling Patterns

August

31Multi-Agent AI Systems: Building Collaborative AI Applications
30Systemd Tricks We Use to Keep Services Boring
29Architecture Review: Ansible Role Design for Large Teams
28A Pragmatic Multi-Region Strategy for Small Teams
27Prompt Engineering Best Practices: Maximizing LLM Performance
26What We Learned Running Weekly Game Days on Our CI/CD Pipeline
25Architecture Review: Terraform State Isolation by Environment
24Real-World RAG Incidents: Lessons from a Production Rollout
23AI Model Deployment Strategies: From Development to Production
22How We Stopped Terraform Drift from Surprising On-Call
21Systemd Tricks We Use to Keep Services Boring
20Model Quantization Techniques: Reducing LLM Size and Cost
20Architecture Review: GitHub Actions Pipeline Reliability
19A Pragmatic Multi-Region Strategy for Small Teams
18What We Learned Running Weekly Game Days on Our CI/CD Pipeline
17Real-World RAG Incidents: Lessons from a Production Rollout
16Architecture Review: Docker Image Hardening for Production
16Vector Databases for AI: Comparing Pinecone, Weaviate, and ChromaDB
15How We Stopped Terraform Drift from Surprising On-Call
14Systemd Tricks We Use to Keep Services Boring
13Building RAG Applications: A Complete Guide to Retrieval Augmented Generation
12Architecture Review: Kubernetes Cluster Upgrade Strategy
12RAG in Production: Reliability, Latency, and Cost for LLM Apps
11A Pragmatic Multi-Region Strategy for Small Teams
10What We Learned Running Weekly Game Days on Our CI/CD Pipeline
9Best Practices: AI Inference Cost Optimization
8Real-World RAG Incidents: Lessons from a Production Rollout
7How We Stopped Terraform Drift from Surprising On-Call
6Systemd Tricks We Use to Keep Services Boring
5Best Practices: SLO-Based Monitoring for APIs
4A Pragmatic Multi-Region Strategy for Small Teams
3What We Learned Running Weekly Game Days on Our CI/CD Pipeline
2Real-World RAG Incidents: Lessons from a Production Rollout
1Best Practices: Secure Container Supply Chain Controls

September

30What We Learned Running Weekly Game Days on Our CI/CD Pipeline
29Architecture Review: RAG Retrieval Quality Evaluation
28GitOps with ArgoCD: Automating Kubernetes Deployments
27Real-World RAG Incidents: Lessons from a Production Rollout
26How We Stopped Terraform Drift from Surprising On-Call
25Kubernetes Networking Deep Dive: Understanding Pods, Services, and Ingress
24Architecture Review: Prompt Versioning and Regression Testing
23Systemd Tricks We Use to Keep Services Boring
22AWS Lambda and Serverless Best Practices for Production
21Production AI Pipelines: Building End-to-End ML Systems
20Architecture Review: LLM Gateway Design for Multi-Provider Inference
19A Pragmatic Multi-Region Strategy for Small Teams
18AI Security and Safety: Protecting Your AI Applications
17Architecture Review: Kernel and Package Patch Management
16What We Learned Running Weekly Game Days on Our CI/CD Pipeline
15Real-World RAG Incidents: Lessons from a Production Rollout
14Embedding Models Comparison: Choosing the Right Model for Your Use Case
13Architecture Review: Systemd Service Reliability Patterns
12How We Stopped Terraform Drift from Surprising On-Call
11Systemd Tricks We Use to Keep Services Boring
10AI Cost Optimization: Reducing LLM Inference Costs by 80%
9Architecture Review: Linux Performance Baseline Methodology
8A Pragmatic Multi-Region Strategy for Small Teams
7Fine-tuning vs Few-Shot Learning: When to Use Each Approach
6What We Learned Running Weekly Game Days on Our CI/CD Pipeline
5Architecture Review: Cloud Disaster Recovery Runbook Design
4Real-World RAG Incidents: Lessons from a Production Rollout
3AI Observability and Monitoring: Tracking Model Performance in Production
2How We Stopped Terraform Drift from Surprising On-Call
1Autonomous CI/CD Pipelines: Self-Healing and AI-Assisted Deployments
1Architecture Review: AWS Cost Control with Tagging and Budgets

October

31Canary Releases: Gradual Rollout Strategy
30How We Stopped Terraform Drift from Surprising On-Call
29Architecture Review: Cloud Networking Segmentation Patterns
28Systemd Tricks We Use to Keep Services Boring
27Blue-Green Deployments: Zero-Downtime Releases
26Architecture Review: Incident Response for Platform Teams
25A Pragmatic Multi-Region Strategy for Small Teams
24Log Aggregation Strategies: Centralizing Your Logs
23What We Learned Running Weekly Game Days on Our CI/CD Pipeline
22Architecture Review: Blue-Green Deployment Guardrails
21Real-World RAG Incidents: Lessons from a Production Rollout
20Infrastructure Monitoring with Prometheus: Complete Setup Guide
19How We Stopped Terraform Drift from Surprising On-Call
18Architecture Review: Infrastructure Drift Detection Workflow
17Systemd Tricks We Use to Keep Services Boring
16Docker Multi-Stage Builds: Optimizing Image Size
15A Pragmatic Multi-Region Strategy for Small Teams
14Architecture Review: Multi-Cluster Traffic Routing Strategies
13Kubernetes Backup Strategies: Protecting Your Cluster Data
12MLOps Pipelines: From Experiment to Production Models
11What We Learned Running Weekly Game Days on Our CI/CD Pipeline
10Architecture Review: Kubernetes Secrets and External Vault Integration
9Service Mesh Implementation: Istio vs Linkerd
8Real-World RAG Incidents: Lessons from a Production Rollout
7Architecture Review: Python Worker Queue Scaling Patterns
6CI/CD Pipeline Optimization: Speeding Up Your Builds
5How We Stopped Terraform Drift from Surprising On-Call
4Systemd Tricks We Use to Keep Services Boring
3Architecture Review: Model Serving Observability Stack
2Container Security Scanning: Protecting Your Docker Images
1A Pragmatic Multi-Region Strategy for Small Teams

November

30Operational Checklist: Terraform State Isolation by Environment
29Cloud Networking Fundamentals: VPCs, Subnets, and Routing
28What We Learned Running Weekly Game Days on Our CI/CD Pipeline
27Real-World RAG Incidents: Lessons from a Production Rollout
26Operational Checklist: GitHub Actions Pipeline Reliability
25AWS ECS vs EKS: Choosing the Right Container Platform
24How We Stopped Terraform Drift from Surprising On-Call
23Systemd Tricks We Use to Keep Services Boring
22Container Image Scanning in CI and at Runtime
22Operational Checklist: Docker Image Hardening for Production
21Cloud Security Best Practices: Securing Your AWS Infrastructure
20A Pragmatic Multi-Region Strategy for Small Teams
19What We Learned Running Weekly Game Days on Our CI/CD Pipeline
18Operational Checklist: Kubernetes Cluster Upgrade Strategy
18Serverless Architecture Patterns: Building Scalable Applications
17Real-World RAG Incidents: Lessons from a Production Rollout
16How We Stopped Terraform Drift from Surprising On-Call
15Architecture Review: AI Inference Cost Optimization
14Cloud Cost Monitoring: Tracking and Optimizing AWS Spending
13Systemd Tricks We Use to Keep Services Boring
12A Pragmatic Multi-Region Strategy for Small Teams
11Multi-Region Deployment: Building Resilient Cloud Applications
11Architecture Review: SLO-Based Monitoring for APIs
10What We Learned Running Weekly Game Days on Our CI/CD Pipeline
9Real-World RAG Incidents: Lessons from a Production Rollout
8How We Stopped Terraform Drift from Surprising On-Call
7AWS Lambda Optimization: Reducing Costs and Improving Performance
7Architecture Review: Secure Container Supply Chain Controls
6Systemd Tricks We Use to Keep Services Boring
5A Pragmatic Multi-Region Strategy for Small Teams
4What We Learned Running Weekly Game Days on Our CI/CD Pipeline
3DevOps Metrics and KPIs: Measuring Success
2Architecture Review: Infrastructure Documentation as Code
2Multi-Region Resilience: Failover, Data, and DNS
1Real-World RAG Incidents: Lessons from a Production Rollout

December

31File System Optimization: Improving Disk Performance
31Operational Checklist: Prompt Versioning and Regression Testing
30How We Stopped Terraform Drift from Surprising On-Call
29Systemd Tricks We Use to Keep Services Boring
28A Pragmatic Multi-Region Strategy for Small Teams
27Process Management and Monitoring in Linux
27Operational Checklist: LLM Gateway Design for Multi-Provider Inference
26What We Learned Running Weekly Game Days on Our CI/CD Pipeline
25Real-World RAG Incidents: Lessons from a Production Rollout
24Linux Security Hardening: Protecting Your System
24Operational Checklist: Kernel and Package Patch Management
23How We Stopped Terraform Drift from Surprising On-Call
22Systemd Tricks We Use to Keep Services Boring
21A Pragmatic Multi-Region Strategy for Small Teams
20Operational Checklist: Systemd Service Reliability Patterns
20Network Configuration and Troubleshooting in Linux
19What We Learned Running Weekly Game Days on Our CI/CD Pipeline
18Real-World RAG Incidents: Lessons from a Production Rollout
17Linux Performance Tuning: Optimizing System Performance
16Operational Checklist: Linux Performance Baseline Methodology
15How We Stopped Terraform Drift from Surprising On-Call
14Systemd Tricks We Use to Keep Services Boring
13Systemd Service Management: Creating and Managing Services
13Systemd and Modern Linux Service Management
12A Pragmatic Multi-Region Strategy for Small Teams
11Operational Checklist: Cloud Disaster Recovery Runbook Design
10What We Learned Running Weekly Game Days on Our CI/CD Pipeline
9Edge Computing with AWS: CloudFront and Lambda@Edge
8Real-World RAG Incidents: Lessons from a Production Rollout
7Operational Checklist: AWS Cost Control with Tagging and Budgets
6Cloud-Native Databases: Choosing the Right Database for Your Workload
5How We Stopped Terraform Drift from Surprising On-Call
4Operational Checklist: Ansible Role Design for Large Teams
3Systemd Tricks We Use to Keep Services Boring
2Disaster Recovery in the Cloud: Backup and Recovery Strategies
1A Pragmatic Multi-Region Strategy for Small Teams

9

Practical Guide: Docker Image Hardening for Production

8Zero Trust Architecture in Multi-Cloud

5Practical Guide: Kubernetes Cluster Upgrade Strategy

5Terraform State Management Strategies

3Building Scalable CI/CD Pipelines with GitHub Actions

1Fine-tuning Llama 3 on Consumer Hardware

February

29Practical Guide: Python Worker Queue Scaling Patterns
25Practical Guide: Model Serving Observability Stack
21Practical Guide: RAG Retrieval Quality Evaluation
17Practical Guide: Prompt Versioning and Regression Testing
13Practical Guide: LLM Gateway Design for Multi-Provider Inference
12Fine-tuning Large Language Models: A Practical Guide
10Practical Guide: Kernel and Package Patch Management
10Infrastructure as Code: Terraform vs Pulumi vs Ansible
7Linux System Monitoring with Prometheus and Grafana
5Practical Guide: Systemd Service Reliability Patterns
5AWS Cost Optimization: 10 Strategies to Reduce Your Cloud Bill
3Building Production-Ready AI Applications with LangChain and Docker
1Practical Guide: Linux Performance Baseline Methodology
1Kubernetes Autoscaling: HPA vs VPA vs Cluster Autoscaler

March

31Practical Guide: Secure Container Supply Chain Controls
27Practical Guide: Infrastructure Documentation as Code
23Practical Guide: Cloud Networking Segmentation Patterns
20Practical Guide: Incident Response for Platform Teams
16Practical Guide: Blue-Green Deployment Guardrails
11Practical Guide: Infrastructure Drift Detection Workflow
7Practical Guide: Multi-Cluster Traffic Routing Strategies
3Practical Guide: Kubernetes Secrets and External Vault Integration

April

28Deep Dive: Ansible Role Design for Large Teams
24Deep Dive: Terraform State Isolation by Environment
19Deep Dive: GitHub Actions Pipeline Reliability
15Deep Dive: Docker Image Hardening for Production
11Deep Dive: Kubernetes Cluster Upgrade Strategy
8Practical Guide: AI Inference Cost Optimization
4Practical Guide: SLO-Based Monitoring for APIs

May

28Deep Dive: RAG Retrieval Quality Evaluation
24Deep Dive: Prompt Versioning and Regression Testing
20Deep Dive: LLM Gateway Design for Multi-Provider Inference
17Deep Dive: Kernel and Package Patch Management
13Deep Dive: Systemd Service Reliability Patterns
9Deep Dive: Linux Performance Baseline Methodology
5Deep Dive: Cloud Disaster Recovery Runbook Design
1Deep Dive: AWS Cost Control with Tagging and Budgets

June

28Deep Dive: Cloud Networking Segmentation Patterns
25Deep Dive: Incident Response for Platform Teams
21Deep Dive: Blue-Green Deployment Guardrails
17Deep Dive: Infrastructure Drift Detection Workflow
13Deep Dive: Multi-Cluster Traffic Routing Strategies
9Deep Dive: Kubernetes Secrets and External Vault Integration
6Deep Dive: Python Worker Queue Scaling Patterns
2Deep Dive: Model Serving Observability Stack

July

30Production Playbook: Terraform State Isolation by Environment
26Production Playbook: GitHub Actions Pipeline Reliability
22Production Playbook: Docker Image Hardening for Production
18Production Playbook: Kubernetes Cluster Upgrade Strategy
15Deep Dive: AI Inference Cost Optimization
11Deep Dive: SLO-Based Monitoring for APIs
7Deep Dive: Secure Container Supply Chain Controls
2Deep Dive: Infrastructure Documentation as Code

August

30Production Playbook: Prompt Versioning and Regression Testing
26Production Playbook: LLM Gateway Design for Multi-Provider Inference
23Production Playbook: Kernel and Package Patch Management
19Production Playbook: Systemd Service Reliability Patterns
15Production Playbook: Linux Performance Baseline Methodology
10Production Playbook: Cloud Disaster Recovery Runbook Design
6Production Playbook: AWS Cost Control with Tagging and Budgets
3Production Playbook: Ansible Role Design for Large Teams

September

27Production Playbook: Blue-Green Deployment Guardrails
23Production Playbook: Infrastructure Drift Detection Workflow
18Production Playbook: Multi-Cluster Traffic Routing Strategies
14Production Playbook: Kubernetes Secrets and External Vault Integration
11Production Playbook: Python Worker Queue Scaling Patterns
7Production Playbook: Model Serving Observability Stack
3Production Playbook: RAG Retrieval Quality Evaluation

October

28Field Notes: Docker Image Hardening for Production
23Field Notes: Kubernetes Cluster Upgrade Strategy
20Production Playbook: AI Inference Cost Optimization
16Production Playbook: SLO-Based Monitoring for APIs
12Production Playbook: Secure Container Supply Chain Controls
8Production Playbook: Infrastructure Documentation as Code
4Production Playbook: Cloud Networking Segmentation Patterns
1Production Playbook: Incident Response for Platform Teams

November

28Field Notes: Kernel and Package Patch Management
24Field Notes: Systemd Service Reliability Patterns
20Field Notes: Linux Performance Baseline Methodology
16Field Notes: Cloud Disaster Recovery Runbook Design
12Field Notes: AWS Cost Control with Tagging and Budgets
9Field Notes: Ansible Role Design for Large Teams
5Field Notes: Terraform State Isolation by Environment
1Field Notes: GitHub Actions Pipeline Reliability

December

29Field Notes: Infrastructure Drift Detection Workflow
25Field Notes: Multi-Cluster Traffic Routing Strategies
21Field Notes: Kubernetes Secrets and External Vault Integration
18Field Notes: Python Worker Queue Scaling Patterns
14Field Notes: Model Serving Observability Stack
10Field Notes: RAG Retrieval Quality Evaluation
6Field Notes: Prompt Versioning and Regression Testing
1Field Notes: LLM Gateway Design for Multi-Provider Inference