Available for senior / staff SRE roles

Hey, I'm Sheshank
Staff SRE.

I build and operate reliable infrastructure at scale — multi-cloud (AWS & GCP), Kubernetes, and deep observability stacks. I write about what I learn.

8+
years in SRE
Staff
IC level
AWS + GCP
multi-cloud
99.99%
target availability
sheshank@k8s-prod-1 ~
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION
prod-node-1.us-east-1 Ready worker 42d v1.29.2
prod-node-2.us-west-2 Ready worker 42d v1.29.2
prod-node-3.eu-west-1 Ready worker 42d v1.29.2
cat about.yaml # the important stuff
role: Staff SRE
focus: reliability, observability, automation
currently: building at Baseten

01.Skills & Tools

☸️Kubernetes☁️AWS🌐GCP📊Grafana📈Victoria Metrics📋Loki🔗Alloy🔥Prometheus🐶Datadog👁️New Relic🚨PagerDutyincident.io🐧Linux🐳Docker🏗️Terraform🔄ArgoCD

02.Latest Writing

All posts
Apr 15, 2025sreon-call

10 Hard-Won Lessons from 5 Years On-Call

Being on-call teaches you things no classroom or certification ever could. Here are the lessons I keep coming back to after incidents that ranged from embarrassing typos to full multi-region outages.

8 min readRead

03.Projects

All projects
📡

SLO Dashboard Generator

Generates Grafana dashboards from a simple YAML SLO definition. Supports burn-rate alerts and error budget visualization.

GoGrafonnetKubernetesHelm
🔍

k8s-runbook-operator

A Kubernetes operator that attaches runbooks to PodDisruptionBudgets and automatically links them in PagerDuty alerts.

Gocontroller-runtimePagerDuty API
📊

victoria-metrics-migrator

CLI tool for migrating Prometheus recording rules and alert rules to Victoria Metrics with cardinality analysis.

PythonPromQLMetricsQL