Cloud platforms, architecture, and optimization.
There are two hard problems in computer science." We've worked on the cache-invalidation one for a while. The patterns that hold up at scale and the ones that look clean and aren't.
We use Step Functions for batch processing, document ingestion, and a few agentic workflows. The patterns that work, the limits we hit, and where we'd reach for something else.
After two years of running Karpenter on production EKS clusters, the NodePool patterns that survived, the ones we replaced, and the tuning that matters.
A working mental model for AWS VPCs — what each piece does, how they connect, and why "VPC" is the wrong mental model if you came from physical networks.
Create your first S3 bucket, upload and download files, and set up the right access controls — without accidentally making everything public.
Write, package, and deploy a Lambda function using only the AWS CLI. Trigger it via a public URL. Understand what serverless actually means.
We deployed the same edge function on both platforms and measured for a quarter. Where each wins, where each loses, and the surprises along the way.
We deleted every static GCP service account key in our org over six weeks. Here's the migration plan, the gotchas, and the policies we now enforce.
We moved a 60-node production EKS cluster to Auto Mode. Some pain points evaporated, others got harder. The cost picture is more nuanced than the marketing suggests.
We replaced 14 long-lived IAM users with SSO + temporary credentials. The migration plan, the gotchas, and the policies we now enforce.
We had .env files in three repos, AWS keys in Slack DMs, and a postgres password etched into a Confluence page. Cleaning it up took a sprint and changed how we think about secrets.
A real cost audit uncovered idle load balancers, oversized RDS instances, and forgotten snapshots. Here's what we found and how we fixed each one.