DORA Metrics Definition Guide
Overview
This guide explains how each of the five DORA metrics is calculated within the Fawkes platform using Apache DevLake.
The five metrics are: Deployment Frequency, Lead Time for Changes, Change Failure Rate, Mean Time to Restore (MTTR), and Deployment Rework Rate (added in DORA 2025).
DORA (DevOps Research and Assessment) metrics are industry-standard measures of software delivery performance. They help teams understand their delivery velocity, stability, and identify areas for improvement.
GitOps Architecture and Data Sources
In Fawkes, we follow a GitOps pattern where:
- ArgoCD performs the actual deployments to Kubernetes
- Jenkins handles CI (build, test, scan) and updates the GitOps repository
- GitHub provides commit and PR data
- Observability provides incident data
This architecture affects where DORA metrics are sourced:
| DORA Metric | Primary Source | Why |
|---|---|---|
| Deployment Frequency | ArgoCD | ArgoCD syncs are the actual deployments |
| Lead Time for Changes | Git + ArgoCD | Commit time → ArgoCD sync completion |
| Change Failure Rate | ArgoCD + Incidents | Failed syncs + production incidents |
| MTTR | Observability + ArgoCD | Incident creation → restore deployment |
| Deployment Rework Rate | ArgoCD + Git | Fix deployments within 24 h window |
Jenkins provides complementary CI metrics:
- Build success/failure rates
- Test coverage and flakiness
- Quality gate pass rates
- Rework metrics (retries, repeated failures)
The Five DORA Metrics
1. Deployment Frequency
Definition: How often code changes are deployed to production.
Primary Data Source: ArgoCD sync events
In a GitOps architecture, Jenkins does not deploy directly. Instead:
- Jenkins builds and tests code
- Jenkins updates the GitOps repository with new image tags
- ArgoCD detects the change and syncs to Kubernetes
- The ArgoCD sync is the actual deployment event
Calculation:
Deployment Frequency = Number of successful ArgoCD syncs to production / Time period
Performance Levels: | Level | Frequency | |-------|-----------| | Elite | Multiple times per day | | High | Once per day to once per week | | Medium | Once per week to once per month | | Low | Less than once per month |
Fawkes Implementation:
- DevLake ArgoCD plugin collects sync events
- Production environments identified by app name pattern (e.g.,
*-prod) - Grafana dashboard displays daily/weekly/monthly trends
2. Lead Time for Changes
Definition: Time from code commit to running in production.
Data Sources:
- GitHub: Commit timestamps
- ArgoCD: Sync completion timestamps
Calculation:
Lead Time = ArgoCD Sync Timestamp - First Commit Timestamp
The lead time includes:
- Development Time: Time spent coding
- Review Time: Time in code review/PR process
- CI Time: Jenkins build and test execution
- GitOps Sync Time: ArgoCD reconciliation
Performance Levels: | Level | Lead Time | |-------|-----------| | Elite | Less than 1 hour | | High | 1 hour to 1 day | | Medium | 1 day to 1 week | | Low | More than 1 week |
Fawkes Implementation:
- DevLake correlates GitHub commits with ArgoCD syncs
- Commit SHA links are used for correlation
- Pipeline stages are tracked for detailed breakdown
3. Change Failure Rate (CFR)
Definition: Percentage of deployments that cause a failure in production.
Data Sources:
- ArgoCD: Sync status (success/failed/degraded)
- Observability: Incident records from Alertmanager
- Webhooks: Manual incident creation
Calculation:
CFR = (Failed ArgoCD Syncs + Production Incidents) / Total ArgoCD Syncs × 100%
A deployment is considered a failure if:
- The ArgoCD sync fails or enters degraded state
- An incident is created within a configured time window after sync
- A rollback sync is triggered
Performance Levels: | Level | CFR | |-------|-----| | Elite | 0-5% | | High | 5-10% | | Medium | 10-15% | | Low | 15%+ |
Fawkes Implementation:
- ArgoCD sync failures are automatically tracked
- Observability platform sends incident webhooks
- Correlation with recent syncs determines CFR attribution
4. Mean Time to Restore (MTTR)
Definition: Average time to restore service after a production incident.
Data Sources:
- Observability: Incident creation timestamps (from Alertmanager)
- ArgoCD: Restore sync completion timestamps
- Webhooks: Manual incident resolution events
Calculation:
MTTR = Sum(Resolution Time - Creation Time) / Number of Incidents
Resolution is detected when:
- A subsequent successful ArgoCD sync occurs
- Alertmanager alert resolves
- Manual resolution via webhook/API
Performance Levels: | Level | MTTR | |-------|------| | Elite | Less than 1 hour | | High | 1 hour to 1 day | | Medium | 1 day to 1 week | | Low | More than 1 week |
Fawkes Implementation:
- Incidents are created automatically from Alertmanager
- ArgoCD restore syncs are correlated with open incidents
- Grafana dashboard shows MTTR trends by severity
5. Deployment Rework Rate
Definition: Percentage of deployments that require a follow-up deployment to fix a problem introduced by the original deployment, within a configurable time window (default: 24 hours).
Primary Data Sources: ArgoCD and Git
Calculation:
Deployment Rework Rate = Rework Deployments / Total Deployments × 100%
A deployment is classified as rework when:
- A second ArgoCD sync to the same service occurs within the rework window (default 24 h), and
- The triggering commit matches any of the following signals:
- Conventional commit prefix:
fix:,hotfix:, orrevert: - Deployment labelled
dora.dev/rework: "true"in the ArgoCD Application spec - PR title or body contains
[rework]or[hotfix]
Performance Levels (DORA 2025):
| Level | Rework Rate |
|---|---|
| Elite | < 5% |
| High | 5–10% |
| Medium | 10–20% |
| Low | 20%+ |
Fawkes Implementation:
- DevLake ArgoCD plugin records sync events with commit metadata
- Git commit message patterns are matched against the rework signal list
- The rework window is configurable via the
rework_detection.window_hourssetting in the DevLake collector configuration (default:24) - Grafana dashboard displays rework rate trend and rework deployments by service
Jenkins CI/Rework Metrics
While DORA metrics focus on deployment and production, Jenkins provides valuable CI quality metrics:
Build Success Rate
Build Success Rate = Successful Builds / Total Builds × 100%
Tracks the reliability of the CI pipeline.
Rework Rate
Rework Rate = Retry Builds / Unique Commits × 100%
Measures how often builds need to be re-run for the same code.
Quality Gate Pass Rate
QG Pass Rate = Passed Quality Gates / Total Scans × 100%
Tracks SonarQube quality gate success over time.
Test Flakiness
Flakiness = Flaky Test Runs / Total Test Runs × 100%
Identifies non-deterministic test failures.
Build Duration Trend
Tracks average and P95 build durations over time.
Example Jenkins Pipeline Usage:
@Library('fawkes-pipeline-library') _
// Record build metrics
doraMetrics.recordBuild(
service: 'my-service',
status: 'success',
stage: 'build'
)
// Record quality gate results
doraMetrics.recordQualityGate(
service: 'my-service',
passed: true,
coveragePercent: 85,
vulnerabilities: 0
)
// Record test results for flakiness tracking
doraMetrics.recordTestResults(
service: 'my-service',
totalTests: 150,
passedTests: 148,
failedTests: 2,
flakyTests: 1
)
// At pipeline end
doraMetrics.recordPipelineComplete(service: 'my-service')
Data Flow Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Data Sources │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ GitHub │ │ ArgoCD │ │ Jenkins │ │
│ │ │ │ (PRIMARY) │ │ (CI/QA) │ │
│ │ • Commits │ │ │ │ │ │
│ │ • PRs │ │ • Syncs │ │ • Builds │ │
│ │ • Branches │ │ • Deploys │ │ • Tests │ │
│ └──────┬───────┘ │ • Rollbacks │ │ • Scans │ │
│ │ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ ┌──────────────┐ │ │ │
│ │ Observability│ │ │ │
│ │ │ │ │ │
│ │ • Incidents │ │ │ │
│ │ • Alerts │ │ │ │
│ │ • SLOs │ │ │ │
│ └──────┬───────┘ │ │ │
│ │ │ │ │
└─────────┼─────────────────┼──────────────────┼───────────────────┘
│ API │ API │ Webhook
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ DevLake Platform │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Collectors │ │
│ │ GitHub │ ArgoCD │ Jenkins │ Webhook │ │
│ │ Plugin │ Plugin │ Plugin │ Plugin │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Data Processing │ │
│ │ • Commit → ArgoCD Sync Correlation (Lead Time) │ │
│ │ • ArgoCD Sync Frequency (Deployment Frequency) │ │
│ │ • Sync Failures + Incidents (CFR) │ │
│ │ • Incident → Restore Sync (MTTR) │ │
│ │ • Jenkins CI Build Rework Metrics (non-DORA CI signal) │ │
│ │ • Fix Commits within 24 h Window (Deployment Rework Rate) │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ MySQL Database │ │
│ │ Raw events, domain models, calculated metrics │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │ │
└──────────────────────────────┼────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Visualization │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Grafana │ │ Backstage │ │ DevLake UI │ │
│ │ Dashboards │ │ Plugin │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Accessing DORA Metrics
Grafana Dashboards
Access the DORA metrics dashboards at:
- URL:
http://devlake-grafana.127.0.0.1.nip.io - Credentials: Use the Grafana admin credentials
Available dashboards:
- DORA Overview - All metrics at a glance
- Deployment Frequency - Detailed deploy trends
- Lead Time for Changes - Time breakdown by stage
- Change Failure Rate - Failure analysis
- Mean Time to Restore - Incident resolution trends
- Deployment Rework Rate - Fix deployment trends and rework by service
Backstage Developer Portal
DORA metrics are integrated into Backstage service pages:
- Navigate to your service in the Backstage catalog
- Click the "DORA Metrics" tab
- View the five metrics with performance ratings
DevLake UI
Access the DevLake configuration UI at:
- URL:
http://devlake.127.0.0.1.nip.io - Configure data sources and projects
Jenkins Pipeline
View DORA metrics summary in pipeline output:
doraMetrics.getMetricsSummary('my-service')
Best Practices
Improving Deployment Frequency
- Adopt trunk-based development
- Use feature flags for incremental releases
- Automate deployment pipelines
- Reduce batch sizes
Reducing Lead Time
- Implement fast code review processes
- Use automated testing
- Parallelize CI stages
- Enable self-service deployments
Lowering Change Failure Rate
- Increase test coverage
- Implement canary deployments
- Use feature flags for gradual rollouts
- Conduct thorough code reviews
Reducing MTTR
- Implement robust monitoring and alerting
- Create runbooks for common issues
- Use automated rollbacks
- Practice incident response drills
Troubleshooting
Missing Metrics
Problem: DORA metrics show N/A
Solutions:
- Verify data source connections in DevLake UI
- Check that Jenkins is sending deployment events
- Ensure GitHub collector is configured correctly
- Run a manual data collection in DevLake
Incorrect Calculations
Problem: Metrics don't match expectations
Solutions:
- Verify commit SHAs are being recorded
- Check time zone configurations
- Review the deployment pattern regex
- Validate incident correlation settings
No Data in Grafana
Problem: Grafana dashboards are empty
Solutions:
- Verify DevLake data sync completed
- Check Grafana data source configuration
- Adjust time range in dashboard
- Run DevLake blueprint manually