_d
devops/ness
Blog
Reading ListAbout
Subscribe

Blog

Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.

What We Learned Running Weekly Game Days on Our CI/CD Pipeline
••March 10, 2025

What We Learned Running Weekly Game Days on Our CI/CD Pipeline

Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.

KU
Kiril urbonas
Read article
Troubleshooting: LLM Gateway Design for Multi-Provider Inference
••March 9, 2025

Troubleshooting: LLM Gateway Design for Multi-Provider Inference

LLM Gateway Design for Multi-Provider Inference. Practical guidance for reliable, scalable platform operations.

KU
Kiril Urbonas
Read article
Real-World RAG Incidents: Lessons from a Production Rollout
••March 8, 2025

Real-World RAG Incidents: Lessons from a Production Rollout

A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.

KU
Kiril urbonas
Read article
How We Stopped Terraform Drift from Surprising On-Call
••March 7, 2025

How We Stopped Terraform Drift from Surprising On-Call

A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.

KU
Kiril urbonas
Read article
Troubleshooting: Kernel and Package Patch Management
••March 6, 2025

Troubleshooting: Kernel and Package Patch Management

Kernel and Package Patch Management. Practical guidance for reliable, scalable platform operations.

KU
Kiril Urbonas
Read article
Systemd Tricks We Use to Keep Services Boring
••March 5, 2025

Systemd Tricks We Use to Keep Services Boring

Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.

KU
Kiril urbonas
Read article
A Pragmatic Multi-Region Strategy for Small Teams
••March 4, 2025

A Pragmatic Multi-Region Strategy for Small Teams

How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.

KU
Kiril urbonas
Read article
What We Learned Running Weekly Game Days on Our CI/CD Pipeline
••March 3, 2025

What We Learned Running Weekly Game Days on Our CI/CD Pipeline

Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.

KU
Kiril urbonas
Read article
Troubleshooting: Systemd Service Reliability Patterns
••March 2, 2025

Troubleshooting: Systemd Service Reliability Patterns

Systemd Service Reliability Patterns. Practical guidance for reliable, scalable platform operations.

KU
Kiril Urbonas
Read article
Real-World RAG Incidents: Lessons from a Production Rollout
••March 1, 2025

Real-World RAG Incidents: Lessons from a Production Rollout

A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.

KU
Kiril urbonas
Read article
AI Agents in DevOps: From Copilots to Autonomous Automation in 2025
••February 28, 2025

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025

How AI agents are moving from read-only copilots to autonomous automation with guardrails. Best practices for approval gates and rollback.

KU
Kiril urbonas
Read article
Troubleshooting: Linux Performance Baseline Methodology
••February 26, 2025

Troubleshooting: Linux Performance Baseline Methodology

Linux Performance Baseline Methodology. Practical guidance for reliable, scalable platform operations.

KU
Kiril Urbonas
Read article
Page 33 of 44 · 519 posts
Previous
1...323334...44
Next