writing

Thoughts on reliability & infrastructure

I write about SRE practices, Kubernetes, observability, incident management, and building systems that don't wake you up at 3 AM.

alloyawseventsgitopsgrafanaincident-responsekuberneteslokimimirobservabilityon-callprometheussrevictoria-metrics
Apr 15, 2025sreon-call

10 Hard-Won Lessons from 5 Years On-Call

Being on-call teaches you things no classroom or certification ever could. Here are the lessons I keep coming back to after incidents that ranged from embarrassing typos to full multi-region outages.

8 min readRead
# how to publish a new post
$ cd content/blog
$ touch my-new-post.mdx # add frontmatter + markdown
$ git add . && git commit -m "feat: add new post"
$ git push # Vercel auto-deploys in ~30s