How I Built a GitOps Pipeline for Grafana Dashboard Lifecycle Management
A GitOps workflow for Grafana dashboards that keeps the UI as the authoring surface while adding version control, CI validation, peer review, and an audit trail.
I write about SRE practices, Kubernetes, observability, incident management, and building systems that don't wake you up at 3 AM.
A GitOps workflow for Grafana dashboards that keeps the UI as the authoring surface while adding version control, CI validation, peer review, and an audit trail.
A practical guide to running single-node Grafana Mimir with Prometheus on EC2, tuned for observability workloads and local NVMe-backed storage.
How Kubernetes events can close the gap between metrics, logs, and root cause analysis by streaming cluster events into Loki for richer incident context.
A walkthrough for running Grafana Alloy on EC2, scraping Prometheus-style metrics, and forwarding them into a remote backend such as Grafana Mimir.
Being on-call teaches you things no classroom or certification ever could. Here are the lessons I keep coming back to after incidents that ranged from embarrassing typos to full multi-region outages.
How we moved from scattered Datadog dashboards to a unified, cost-efficient observability stack using Grafana, Victoria Metrics, Loki, and Alloy — and what we learned along the way.