Blog

Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.

Pre-Commit Hooks That Saved Our Repo: 7 Real Examples

Every hook on this list caught a bug or a security issue in the last twelve months. The configs are short. The savings have been considerable.

Kiril urbonas

Read article

••last week

EKS Auto Mode: What Worked, What Broke in Our Migration

We moved a 60-node production EKS cluster to Auto Mode. Some pain points evaporated, others got harder. The cost picture is more nuanced than the marketing suggests.

Kiril urbonas

Read article

••last week

Self-Hosted LLMs vs OpenAI API: A Cost-vs-Latency Analysis After 6 Months

We ran the same workload on both for half a year. The break-even point isn't where most blog posts say it is — and the latency story has more nuance than throughput-per-dollar charts admit.

Kiril urbonas

Read article

••last week

OpenTelemetry Collector Pipelines: Real Configs That Survived Production

We've been running the OTel Collector at the edge of every cluster for 18 months. The config patterns that lasted, the ones we ripped out, and a few processors that quietly saved us money.

Kiril urbonas

Read article

••last week

Blue/Green Deploys for Stateful Services: A Postgres Cutover Story

Blue/green is easy for stateless services. We did it for our primary Postgres cluster with 3.2TB of data and ~8k connections. Here's exactly how — and what almost went wrong.

Kiril urbonas

Read article

••last week

systemd Timers vs Cron: When We Switched and What We Learned

We migrated 47 cron jobs to systemd timers across our fleet. The mechanical conversion was easy. The interesting parts were the bugs we found that cron had been hiding.

Kiril urbonas

Read article

••last week

Zero Trust on AWS: Lessons From Implementing IAM Identity Center

We replaced 14 long-lived IAM users with SSO + temporary credentials. The migration plan, the gotchas, and the policies we now enforce.

Kiril urbonas

Read article

••last week

Embedding Quality in RAG: How We Cut Hallucinations by 60%

Six months running RAG in production taught us that the retrieval step matters far more than the model. Concrete techniques that moved the needle, with before/after numbers.

Kiril urbonas

Read article

••2 weeks ago

Database Migrations Without Downtime: Patterns From Three Real Cutovers

How we shipped three schema migrations with zero customer impact. Expand-then-contract, dual-writes, and the rollback plan we never had to use — but tested anyway.

Kiril urbonas

Read article