Blog

Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.

••2 weeks ago

Terraform Modules Done Right: Lessons from Managing 50+ Services

Practical patterns for Terraform modules at scale: versioning, composition, testing, and avoiding the monolith trap.

Kiril urbonas

Read article

••2 weeks ago

Linux Performance Troubleshooting: A Real Incident Walkthrough

Step-by-step debugging of a production Linux server hitting 100% CPU. From top to perf to the actual fix.

Kiril urbonas

Read article

••2 weeks ago

Prompt Engineering Patterns That Actually Work in Production

Battle-tested prompt patterns from running LLM features in production: structured output, chain-of-thought, and graceful failure handling.

Kiril urbonas

Read article

••3 weeks ago

AWS Cost Audit: 7 Things We Found Wasting Money Every Month

A real cost audit uncovered idle load balancers, oversized RDS instances, and forgotten snapshots. Here's what we found and how we fixed each one.

Kiril urbonas

Read article

••3 weeks ago

How We Cut Our Docker Image Size by 80% and Why It Matters

A real walkthrough of shrinking bloated Docker images from 1.2GB to 240MB using multi-stage builds, Alpine, and dependency auditing.

Kiril urbonas

Read article

••3 weeks ago

Model Fallback Policies for Customer-Facing AI: The Routing Rules That Kept SLA Intact

A real-world model fallback guide for customer-facing AI systems, covering how one team preserved response quality and support SLAs during a partial provider degradation.

Kiril urbonas

Read article

••3 weeks ago

Artifact Promotion Instead of Rebuilds: The Release Control Pattern That Stopped Drift

A practical artifact promotion guide for CI/CD teams that were tired of hearing 'it passed in staging' after production behaved differently because the release was rebuilt.

Kiril urbonas

Read article

••3 weeks ago

RDS Restore Drills for Busy Teams: The Recovery Workflow That Surfaced Real Gaps

A hands-on RDS restore drill guide for small cloud teams that thought backups were covered until a timed restore test exposed missing steps, DNS confusion, and stale credentials.

Kiril urbonas

Read article

••3 weeks ago

Systemd Drop-In Overrides for Vendor Services: The Supportable Linux Ops Pattern

A practical systemd drop-in guide built from a real operations problem: vendor unit files kept changing, but the team still needed consistent restart, environment, and logging behavior.

Kiril urbonas

Read article

••3 weeks ago

Terraform Module Version Pinning: How One Platform Team Stopped Surprise Breakage

A real-world Terraform module version pinning guide for platform teams that want safer upgrades, clearer ownership, and fewer broken pipelines after shared module releases.

Kiril urbonas

Read article

••0 months ago

Embedding Model Upgrades Without Search Chaos: A Safer RAG Rollout Pattern

A practical embedding model upgrade guide for RAG systems, built from a real support-search migration that initially reduced answer quality instead of improving it.

Kiril urbonas

Read article

••0 months ago

Multi-Cluster Traffic Routing Strategies: A Pragmatic Rollout Pattern for Growing SaaS Teams

A real-world multi-cluster traffic routing guide for SaaS teams that have outgrown a single Kubernetes cluster and need safer rollout control without a service-mesh science project.

Kiril urbonas

Read article

Page 2 of 24 · 279 posts