_d
devops/ness
Blog
Reading ListAbout
Subscribe

Blog

Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.

Troubleshooting: Model Serving Observability Stack
••March 21, 2025

Troubleshooting: Model Serving Observability Stack

Model Serving Observability Stack. Practical guidance for reliable, scalable platform operations.

KU
Kiril Urbonas
Read article
Platform Engineering and Internal Developer Platforms in 2025
••March 21, 2025

Platform Engineering and Internal Developer Platforms in 2025

Why IDPs are core to modern DevOps. Self-service, standardized CI/CD, and better developer experience.

KU
Kiril urbonas
Read article
How We Stopped Terraform Drift from Surprising On-Call
••March 20, 2025

How We Stopped Terraform Drift from Surprising On-Call

A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.

KU
Kiril urbonas
Read article
Systemd Tricks We Use to Keep Services Boring
••March 19, 2025

Systemd Tricks We Use to Keep Services Boring

Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.

KU
Kiril urbonas
Read article
A Pragmatic Multi-Region Strategy for Small Teams
••March 18, 2025

A Pragmatic Multi-Region Strategy for Small Teams

How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.

KU
Kiril urbonas
Read article
Troubleshooting: RAG Retrieval Quality Evaluation
••March 17, 2025

Troubleshooting: RAG Retrieval Quality Evaluation

RAG Retrieval Quality Evaluation. Practical guidance for reliable, scalable platform operations.

KU
Kiril Urbonas
Read article
What We Learned Running Weekly Game Days on Our CI/CD Pipeline
••March 16, 2025

What We Learned Running Weekly Game Days on Our CI/CD Pipeline

Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.

KU
Kiril urbonas
Read article
Real-World RAG Incidents: Lessons from a Production Rollout
••March 15, 2025

Real-World RAG Incidents: Lessons from a Production Rollout

A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.

KU
Kiril urbonas
Read article
How We Stopped Terraform Drift from Surprising On-Call
••March 14, 2025

How We Stopped Terraform Drift from Surprising On-Call

A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.

KU
Kiril urbonas
Read article
Troubleshooting: Prompt Versioning and Regression Testing
••March 13, 2025

Troubleshooting: Prompt Versioning and Regression Testing

Prompt Versioning and Regression Testing. Practical guidance for reliable, scalable platform operations.

KU
Kiril Urbonas
Read article
Systemd Tricks We Use to Keep Services Boring
••March 12, 2025

Systemd Tricks We Use to Keep Services Boring

Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.

KU
Kiril urbonas
Read article
A Pragmatic Multi-Region Strategy for Small Teams
••March 11, 2025

A Pragmatic Multi-Region Strategy for Small Teams

How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.

KU
Kiril urbonas
Read article
Page 32 of 44 · 519 posts
Previous
1...313233...44
Next