</> projects

Open source tools & projects

Tools I've built to solve real SRE problems โ€” most of these came from something that burned me on-call or slowed down a migration.

๐Ÿ“กactive

SLO Dashboard Generator

Generates production-ready Grafana dashboards from a simple YAML SLO definition. Supports multi-window burn-rate alerts, error budget visualization, and Grafonnet templating for consistent dashboards across hundreds of services.

GoGrafonnetJsonnetKubernetesHelm
๐Ÿ”active

k8s-runbook-operator

A Kubernetes operator that attaches structured runbooks to PodDisruptionBudgets, Deployments, and Services. Automatically surfaces the relevant runbook URL in PagerDuty alerts and Slack incident channels via annotations.

Gocontroller-runtimePagerDuty APIKubernetes CRDs
๐Ÿ“Šstable

victoria-metrics-migrator

CLI tool for migrating Prometheus recording rules, alerting rules, and dashboards to Victoria Metrics. Includes a cardinality analyzer to identify high-cardinality metrics before migration.

PythonPromQLMetricsQLClick
๐Ÿ”ฅactive

alloy-config-validator

Validates Grafana Alloy (formerly Agent Flow) pipeline configurations in CI. Checks for syntax errors, validates target labels, and warns about high-cardinality label combinations before they hit production.

GoAlloyGitHub ActionsOPA
๐Ÿšจbeta

incident-cost-estimator

Estimates the real cost of incidents based on MTTD/MTTR, affected revenue streams, and team time. Integrates with incident.io webhooks to auto-calculate cost per incident and trend over time.

TypeScriptincident.io APIGrafanaPostgreSQL
๐Ÿ—๏ธactive

terraform-sre-modules

Opinionated Terraform modules for SRE infrastructure: multi-region ALB with health checks, RDS with automated failover, EKS node groups with proper taints, and CloudWatch alarm baselines.

TerraformAWSEKSRDS

All projects are open source. Contributions, issues, and ideas welcome.

View all on GitHub