Skip to main content

Reading List About

Featured Article

MLOps — Model Registry vs MLflow Tracking, And When You Need Both

Tracking experiments and shipping models are different problems. The MLOps tooling assumes one solution; production splits them. The patterns we use.

AI Mlops Mlflow Ai

DevOpsNess

Practical AI, DevOps, Cloud, and Linux guidance for engineering teams

Weekly deep dives, implementation patterns, and reliability-focused playbooks.

Join Newsletter Browse Posts

A practical blog covering AI, cloud, DevOps, and modern technology for engineering teams.

Explore

Latest Articles
Archive
Reading List

Resources

About
RSS Feed
Newsletter

Legal

KU

Kiril UrbonasDevOps Engineer

|

Jun 7, 2026

MLOps — Model Registry vs MLflow Tracking, And When You Need Both

Topics

Terraform186 Monitoring174 AWS144 Kubernetes100 LLM86 CI/CD85 Python81 Linux76 GPT70 Ansible65

Latest Articles

HashiCorp Vault as a Secrets Backend for Kubernetes

••last week

HashiCorp Vault as a Secrets Backend for Kubernetes

Vault + Kubernetes auth + Vault Agent Injector. The setup, the failure modes during pod startup, and the patterns that beat raw Kubernetes Secrets.

Kiril Urbonas·7 min read·4

pg_stat_statements — Postgres Query Analysis Without Guessing

••last week

pg_stat_statements — Postgres Query Analysis Without Guessing

The single most useful Postgres extension you might not be using. The queries it surfaces, the indexes it implies, and the operational discipline of reading it weekly.

Kiril Urbonas·7 min read·5

Linux io_uring — Async I/O Patterns We Use

••last week

Linux io_uring — Async I/O Patterns We Use

io_uring replaces epoll for new high-throughput services. The patterns that earn their place, the gotchas in older kernels, and where we'd still pick epoll.

Kiril Urbonas·7 min read·5

Caching Patterns — Read-Through, Write-Through, Cache-Aside in Practice

••2 weeks ago

Caching Patterns — Read-Through, Write-Through, Cache-Aside in Practice

Three caching patterns, three failure modes. The one we use most, the one that bit us, and the rule that decides which pattern fits which workload.

Kiril Urbonas·7 min read·5

Kafka Partition Strategies — Scaling Consumers Without Reshuffling Everything

••2 weeks ago

Kafka Partition Strategies — Scaling Consumers Without Reshuffling Everything

Picking partition counts and keys decides whether your Kafka consumers scale linearly or hit a wall. The patterns that survived rebalances, partition-count changes, and consumer-group ops.

Kiril Urbonas·7 min read·2

Agentic Ops — When (and When Not) to Use AI Agents for Incident Response

••2 weeks ago

Agentic Ops — When (and When Not) to Use AI Agents for Incident Response

AI agents for incident triage sound great in demos. We've tried it in production. The patterns that earn their keep, the ones that backfire, and where humans still beat agents.

Kiril Urbonas·7 min read·4

Pipeline Observability — Why CI Failures Don't Trigger Alerts (And Should)

••2 weeks ago

Pipeline Observability — Why CI Failures Don't Trigger Alerts (And Should)

Production monitoring catches user-facing issues. CI failures stay invisible until someone notices the merge queue is stuck. The metrics and alerts that make pipelines observable.

Kiril Urbonas·8 min read·3

Terraform Module Versioning and Shared Registries

••2 weeks ago

Terraform Module Versioning and Shared Registries

Version-pinned modules across many repos. The release process, semver discipline, and the breaking-change communication that keeps a shared registry sane.

Kiril Urbonas·7 min read·3

LLM Evals That Actually Predict Production Quality

••2 weeks ago

LLM Evals That Actually Predict Production Quality

Most LLM eval suites correlate poorly with what real users experience. The eval patterns we run that move with prod metrics — and the ones that lied to us.

Kiril Urbonas·7 min read·4

Burn-Rate Alerting — The SLO Discipline That Prevents Alert Fatigue

••2 weeks ago

Burn-Rate Alerting — The SLO Discipline That Prevents Alert Fatigue

Static thresholds on error rate produce noisy alerts. Burn-rate alerting flips the question to "are we burning the error budget faster than we can sustain?" — and pages only on real problems.

Kiril Urbonas·7 min read·8

Container Resource Limits — What They Actually Do at the Kernel Level

••3 weeks ago

Container Resource Limits — What They Actually Do at the Kernel Level

cpu.shares vs cpu.cfs_quota_us vs memory.max — the cgroup mechanics behind Kubernetes resource limits, and the surprises that explain the weird symptoms you've seen.

Kiril Urbonas·7 min read·6

Kubernetes Resource Requests — Right-Sizing Without Guessing

••3 weeks ago

Kubernetes Resource Requests — Right-Sizing Without Guessing

Bad resource requests waste money or trigger OOMs. The methodology we use to right-size requests based on actual usage, and the gotchas the autoscalers don't fix.

Kiril Urbonas·8 min read·2

Page 1 of 38 · 446 posts

Previous

1 2...38

Privacy
Terms

© 2026 DevOpsNess. By Kiril Urbonas.

RSS Privacy Terms