Open source tools & projects
Tools I've built to solve real SRE problems โ most of these came from something that burned me on-call or slowed down a migration.
SLO Dashboard Generator
Generates production-ready Grafana dashboards from a simple YAML SLO definition. Supports multi-window burn-rate alerts, error budget visualization, and Grafonnet templating for consistent dashboards across hundreds of services.
k8s-runbook-operator
A Kubernetes operator that attaches structured runbooks to PodDisruptionBudgets, Deployments, and Services. Automatically surfaces the relevant runbook URL in PagerDuty alerts and Slack incident channels via annotations.
victoria-metrics-migrator
CLI tool for migrating Prometheus recording rules, alerting rules, and dashboards to Victoria Metrics. Includes a cardinality analyzer to identify high-cardinality metrics before migration.
alloy-config-validator
Validates Grafana Alloy (formerly Agent Flow) pipeline configurations in CI. Checks for syntax errors, validates target labels, and warns about high-cardinality label combinations before they hit production.
incident-cost-estimator
Estimates the real cost of incidents based on MTTD/MTTR, affected revenue streams, and team time. Integrates with incident.io webhooks to auto-calculate cost per incident and trend over time.
terraform-sre-modules
Opinionated Terraform modules for SRE infrastructure: multi-region ALB with health checks, RDS with automated failover, EKS node groups with proper taints, and CloudWatch alarm baselines.
All projects are open source. Contributions, issues, and ideas welcome.
View all on GitHub