DORA Metrics Dashboard Demo
This document describes the five DORA (DevOps Research and Assessment) metrics tracked by the Fawkes platform and how to interpret them in the Grafana dashboard.
The Five DORA Metrics
The 2025 DORA research identifies five software delivery performance metrics:
| # | Metric | What It Measures | Elite Target |
|---|---|---|---|
| 1 | Deployment Frequency | How often code is successfully deployed to production | Multiple times per day |
| 2 | Lead Time for Changes | Time from first commit to running in production | Less than one hour |
| 3 | Change Failure Rate | Percentage of deployments that cause a production failure | 0–15% |
| 4 | Time to Restore Service | Mean time to restore service after a production incident | Less than one hour |
| 5 | Deployment Rework Rate | Percentage of deployments that require a follow-up fix deployment within a configurable window (default: 24 h) | < 5% |
Metric Descriptions
1. Deployment Frequency
Definition: How often an organisation successfully releases to production.
Why it matters: High frequency means smaller batches, lower risk, and faster feedback loops.
Performance Levels:
| Level | Frequency |
|---|---|
| Elite | Multiple times per day |
| High | Once per day to once per week |
| Medium | Once per week to once per month |
| Low | Less than once per month |
Grafana panel: Deployment Frequency (stat panel, 7-day window)
2. Lead Time for Changes
Definition: Time from first commit of a change to that change running in production.
Why it matters: Short lead time enables rapid iteration and faster customer feedback.
Performance Levels:
| Level | Lead Time |
|---|---|
| Elite | Less than 1 hour |
| High | 1 hour to 1 day |
| Medium | 1 day to 1 week |
| Low | More than 1 week |
Grafana panel: Lead Time for Changes (gauge, P50/P95/P99 breakdown available)
3. Change Failure Rate
Definition: Percentage of deployments that cause a failure in production requiring a hotfix, rollback, or incident response.
Why it matters: A high failure rate indicates quality or process problems in the delivery pipeline.
Performance Levels:
| Level | CFR |
|---|---|
| Elite | 0–15% |
| High | 16–30% |
| Medium | 31–45% |
| Low | 46%+ |
Grafana panel: Change Failure Rate (gauge)
4. Time to Restore Service
Definition: How long it takes to restore service after a production incident.
Also known as Mean Time to Restore (MTTR).
Why it matters: Fast recovery limits the blast radius of failures and builds customer confidence.
Performance Levels:
| Level | MTTR |
|---|---|
| Elite | Less than 1 hour |
| High | 1 hour to 1 day |
| Medium | 1 day to 1 week |
| Low | More than 1 week |
Grafana panel: Mean Time to Restore (MTTR) (gauge + trend line)
5. Deployment Rework Rate
Definition: The percentage of deployments that require a follow-up deployment to fix a problem introduced by the original deployment.
A deployment is counted as rework when:
- A second deployment to the same service occurs within a configurable window (default: 24 hours), and
- That follow-up deployment is tagged or flagged as a fix (e.g. commit message
contains
fix:,hotfix:, orrevert:, or the deployment is labelleddora.dev/rework: "true").
Why it matters: Rework rate distinguishes between deployments that succeed on the first attempt and those that silently introduce problems requiring rapid follow-up. It is orthogonal to Change Failure Rate — a high rework rate with a low CFR may indicate teams are patching quietly rather than raising incidents.
Performance Levels (DORA 2025 guidance):
| Level | Rework Rate |
|---|---|
| Elite | < 5% |
| High | 5–10% |
| Medium | 10–20% |
| Low | 20%+ |
Collection methodology: See docs/playbooks/dora-metrics-implementation.md for implementation details.
Prometheus metric: dora_deployment_rework_total (counter; rate computed in dashboard)
Grafana panel: Deployment Rework Rate (gauge)
Accessing the Dashboard
The DORA metrics dashboard is available in Grafana:
- Local:
http://localhost:8080/grafana/d/dora-metrics - Credentials:
admin/fawkes-grafana
To open directly:
make dev-status # prints all service URLs and credentials
Dashboard Sections
- Five Key Metrics Overview — stat/gauge panels for all five metrics side by side
- Deployment Trends — deployment frequency over time, by service
- Lead Time Analysis — trend and P50/P95/P99 distribution
- Failure Analysis — change failure rate over time and by service
- Recovery Metrics — MTTR trend and incident count
- Deployment Rework Analysis — rework rate trend and rework deployments by service
- DORA Benchmark Comparison — your performance vs DORA benchmarks