DORA Metrics Definition Guide

Overview

This guide explains how each of the five DORA metrics is calculated within the Fawkes platform using Apache DevLake.

The five metrics are: Deployment Frequency, Lead Time for Changes, Change Failure Rate, Mean Time to Restore (MTTR), and Deployment Rework Rate (added in DORA 2025).

DORA (DevOps Research and Assessment) metrics are industry-standard measures of software delivery performance. They help teams understand their delivery velocity, stability, and identify areas for improvement.

GitOps Architecture and Data Sources

In Fawkes, we follow a GitOps pattern where:

ArgoCD performs the actual deployments to Kubernetes
Jenkins handles CI (build, test, scan) and updates the GitOps repository
GitHub provides commit and PR data
Observability provides incident data

This architecture affects where DORA metrics are sourced:

DORA Metric	Primary Source	Why
Deployment Frequency	ArgoCD	ArgoCD syncs are the actual deployments
Lead Time for Changes	Git + ArgoCD	Commit time → ArgoCD sync completion
Change Failure Rate	ArgoCD + Incidents	Failed syncs + production incidents
MTTR	Observability + ArgoCD	Incident creation → restore deployment
Deployment Rework Rate	ArgoCD + Git	Fix deployments within 24 h window

Jenkins provides complementary CI metrics:

Build success/failure rates
Test coverage and flakiness
Quality gate pass rates
Rework metrics (retries, repeated failures)

The Five DORA Metrics

1. Deployment Frequency

Definition: How often code changes are deployed to production.

Primary Data Source: ArgoCD sync events

In a GitOps architecture, Jenkins does not deploy directly. Instead:

Jenkins builds and tests code
Jenkins updates the GitOps repository with new image tags
ArgoCD detects the change and syncs to Kubernetes
The ArgoCD sync is the actual deployment event

Calculation:

Deployment Frequency = Number of successful ArgoCD syncs to production / Time period

Performance Levels: | Level | Frequency | |-------|-----------| | Elite | Multiple times per day | | High | Once per day to once per week | | Medium | Once per week to once per month | | Low | Less than once per month |

Fawkes Implementation:

DevLake ArgoCD plugin collects sync events
Production environments identified by app name pattern (e.g., *-prod)
Grafana dashboard displays daily/weekly/monthly trends

2. Lead Time for Changes

Definition: Time from code commit to running in production.

Data Sources:

GitHub: Commit timestamps
ArgoCD: Sync completion timestamps

Calculation:

Lead Time = ArgoCD Sync Timestamp - First Commit Timestamp

The lead time includes:

Development Time: Time spent coding
Review Time: Time in code review/PR process
CI Time: Jenkins build and test execution
GitOps Sync Time: ArgoCD reconciliation

Performance Levels: | Level | Lead Time | |-------|-----------| | Elite | Less than 1 hour | | High | 1 hour to 1 day | | Medium | 1 day to 1 week | | Low | More than 1 week |

Fawkes Implementation:

DevLake correlates GitHub commits with ArgoCD syncs
Commit SHA links are used for correlation
Pipeline stages are tracked for detailed breakdown

3. Change Failure Rate (CFR)

Definition: Percentage of deployments that cause a failure in production.

Data Sources:

ArgoCD: Sync status (success/failed/degraded)
Observability: Incident records from Alertmanager
Webhooks: Manual incident creation

Calculation:

CFR = (Failed ArgoCD Syncs + Production Incidents) / Total ArgoCD Syncs × 100%

A deployment is considered a failure if:

The ArgoCD sync fails or enters degraded state
An incident is created within a configured time window after sync
A rollback sync is triggered

Performance Levels: | Level | CFR | |-------|-----| | Elite | 0-5% | | High | 5-10% | | Medium | 10-15% | | Low | 15%+ |

Fawkes Implementation:

ArgoCD sync failures are automatically tracked
Observability platform sends incident webhooks
Correlation with recent syncs determines CFR attribution

4. Mean Time to Restore (MTTR)

Definition: Average time to restore service after a production incident.

Data Sources:

Observability: Incident creation timestamps (from Alertmanager)
ArgoCD: Restore sync completion timestamps
Webhooks: Manual incident resolution events

Calculation:

MTTR = Sum(Resolution Time - Creation Time) / Number of Incidents

Resolution is detected when:

A subsequent successful ArgoCD sync occurs
Alertmanager alert resolves
Manual resolution via webhook/API

Performance Levels: | Level | MTTR | |-------|------| | Elite | Less than 1 hour | | High | 1 hour to 1 day | | Medium | 1 day to 1 week | | Low | More than 1 week |

Fawkes Implementation:

Incidents are created automatically from Alertmanager
ArgoCD restore syncs are correlated with open incidents
Grafana dashboard shows MTTR trends by severity

5. Deployment Rework Rate

Definition: Percentage of deployments that require a follow-up deployment to fix a problem introduced by the original deployment, within a configurable time window (default: 24 hours).

Primary Data Sources: ArgoCD and Git

Calculation:

Deployment Rework Rate = Rework Deployments / Total Deployments × 100%

A deployment is classified as rework when:

A second ArgoCD sync to the same service occurs within the rework window (default 24 h), and
The triggering commit matches any of the following signals:
Conventional commit prefix: fix:, hotfix:, or revert:
Deployment labelled dora.dev/rework: "true" in the ArgoCD Application spec
PR title or body contains [rework] or [hotfix]

Performance Levels (DORA 2025):

Level	Rework Rate
Elite	< 5%
High	5–10%
Medium	10–20%
Low	20%+

Fawkes Implementation:

DevLake ArgoCD plugin records sync events with commit metadata
Git commit message patterns are matched against the rework signal list
The rework window is configurable via the rework_detection.window_hours setting in the DevLake collector configuration (default: 24)
Grafana dashboard displays rework rate trend and rework deployments by service

Jenkins CI/Rework Metrics

While DORA metrics focus on deployment and production, Jenkins provides valuable CI quality metrics:

Build Success Rate

Build Success Rate = Successful Builds / Total Builds × 100%

Tracks the reliability of the CI pipeline.

Rework Rate

Rework Rate = Retry Builds / Unique Commits × 100%

Measures how often builds need to be re-run for the same code.

Quality Gate Pass Rate

QG Pass Rate = Passed Quality Gates / Total Scans × 100%

Tracks SonarQube quality gate success over time.

Test Flakiness

Flakiness = Flaky Test Runs / Total Test Runs × 100%

Identifies non-deterministic test failures.

Build Duration Trend

Tracks average and P95 build durations over time.

Example Jenkins Pipeline Usage:

@Library('fawkes-pipeline-library') _

// Record build metrics
doraMetrics.recordBuild(
    service: 'my-service',
    status: 'success',
    stage: 'build'
)

// Record quality gate results
doraMetrics.recordQualityGate(
    service: 'my-service',
    passed: true,
    coveragePercent: 85,
    vulnerabilities: 0
)

// Record test results for flakiness tracking
doraMetrics.recordTestResults(
    service: 'my-service',
    totalTests: 150,
    passedTests: 148,
    failedTests: 2,
    flakyTests: 1
)

// At pipeline end
doraMetrics.recordPipelineComplete(service: 'my-service')

Data Flow Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Data Sources                              │
│                                                                   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
│  │   GitHub     │  │   ArgoCD     │  │   Jenkins    │          │
│  │              │  │  (PRIMARY)   │  │   (CI/QA)    │          │
│  │ • Commits    │  │              │  │              │          │
│  │ • PRs        │  │ • Syncs      │  │ • Builds     │          │
│  │ • Branches   │  │ • Deploys    │  │ • Tests      │          │
│  └──────┬───────┘  │ • Rollbacks  │  │ • Scans      │          │
│         │          └──────┬───────┘  └──────┬───────┘          │
│         │                 │                  │                   │
│  ┌──────────────┐        │                  │                   │
│  │ Observability│        │                  │                   │
│  │              │        │                  │                   │
│  │ • Incidents  │        │                  │                   │
│  │ • Alerts     │        │                  │                   │
│  │ • SLOs       │        │                  │                   │
│  └──────┬───────┘        │                  │                   │
│         │                 │                  │                   │
└─────────┼─────────────────┼──────────────────┼───────────────────┘
          │    API          │    API           │    Webhook
          ▼                 ▼                  ▼
┌─────────────────────────────────────────────────────────────────┐
│                      DevLake Platform                            │
│                                                                   │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                     Collectors                              │ │
│  │  GitHub   │  ArgoCD     │  Jenkins   │  Webhook            │ │
│  │  Plugin   │  Plugin     │  Plugin    │  Plugin             │ │
│  └────────────────────────────────────────────────────────────┘ │
│                              │                                    │
│                              ▼                                    │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                   Data Processing                           │ │
│  │  • Commit → ArgoCD Sync Correlation (Lead Time)            │ │
│  │  • ArgoCD Sync Frequency (Deployment Frequency)            │ │
│  │  • Sync Failures + Incidents (CFR)                         │ │
│  │  • Incident → Restore Sync (MTTR)                          │ │
│  │  • Jenkins CI Build Rework Metrics (non-DORA CI signal)    │ │
│  │  • Fix Commits within 24 h Window (Deployment Rework Rate) │ │
│  └────────────────────────────────────────────────────────────┘ │
│                              │                                    │
│                              ▼                                    │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                      MySQL Database                         │ │
│  │  Raw events, domain models, calculated metrics              │ │
│  └────────────────────────────────────────────────────────────┘ │
│                              │                                    │
└──────────────────────────────┼────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Visualization                               │
│                                                                   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
│  │   Grafana    │  │  Backstage   │  │  DevLake UI  │          │
│  │  Dashboards  │  │   Plugin     │  │              │          │
│  └──────────────┘  └──────────────┘  └──────────────┘          │
└─────────────────────────────────────────────────────────────────┘

Accessing DORA Metrics

Grafana Dashboards

Access the DORA metrics dashboards at:

URL: http://devlake-grafana.127.0.0.1.nip.io
Credentials: Use the Grafana admin credentials

Available dashboards:

DORA Overview - All metrics at a glance
Deployment Frequency - Detailed deploy trends
Lead Time for Changes - Time breakdown by stage
Change Failure Rate - Failure analysis
Mean Time to Restore - Incident resolution trends
Deployment Rework Rate - Fix deployment trends and rework by service

Backstage Developer Portal

DORA metrics are integrated into Backstage service pages:

Navigate to your service in the Backstage catalog
Click the "DORA Metrics" tab
View the five metrics with performance ratings

DevLake UI

Access the DevLake configuration UI at:

URL: http://devlake.127.0.0.1.nip.io
Configure data sources and projects

Jenkins Pipeline

View DORA metrics summary in pipeline output:

doraMetrics.getMetricsSummary('my-service')

Best Practices

Improving Deployment Frequency

Adopt trunk-based development
Use feature flags for incremental releases
Automate deployment pipelines
Reduce batch sizes

Reducing Lead Time

Implement fast code review processes
Use automated testing
Parallelize CI stages
Enable self-service deployments

Lowering Change Failure Rate

Increase test coverage
Implement canary deployments
Use feature flags for gradual rollouts
Conduct thorough code reviews

Reducing MTTR

Implement robust monitoring and alerting
Create runbooks for common issues
Use automated rollbacks
Practice incident response drills

Troubleshooting

Missing Metrics

Problem: DORA metrics show N/A

Solutions:

Verify data source connections in DevLake UI
Check that Jenkins is sending deployment events
Ensure GitHub collector is configured correctly
Run a manual data collection in DevLake

Incorrect Calculations

Problem: Metrics don't match expectations

Solutions:

Verify commit SHAs are being recorded
Check time zone configurations
Review the deployment pattern regex
Validate incident correlation settings

No Data in Grafana

Problem: Grafana dashboards are empty

Solutions:

Verify DevLake data sync completed
Check Grafana data source configuration
Adjust time range in dashboard
Run DevLake blueprint manually

DORA Metrics Definition Guide

Overview

GitOps Architecture and Data Sources

The Five DORA Metrics

1. Deployment Frequency

2. Lead Time for Changes

3. Change Failure Rate (CFR)

4. Mean Time to Restore (MTTR)

5. Deployment Rework Rate

Jenkins CI/Rework Metrics

Build Success Rate

Rework Rate

Quality Gate Pass Rate

Test Flakiness

Build Duration Trend

Data Flow Architecture

Accessing DORA Metrics

Grafana Dashboards

Backstage Developer Portal

DevLake UI

Jenkins Pipeline

Best Practices

Improving Deployment Frequency

Reducing Lead Time

Lowering Change Failure Rate

Reducing MTTR

Troubleshooting

Missing Metrics

Incorrect Calculations

No Data in Grafana

References