Skip to content

Playbook: DORA Metrics Implementation

Estimated Duration: 4-8 hours Complexity: ⭐⭐ Medium Target Audience: Platform Engineers / DevOps Engineers / Consultants


I. Business Objective

Diátaxis: Explanation / Conceptual

This section defines the "why"—the risk mitigated, compliance goal achieved, and value delivered.

What We're Solving

Organizations often struggle to objectively measure their software delivery performance. Without data, improvement efforts are based on intuition rather than evidence, making it impossible to identify true bottlenecks, demonstrate progress to stakeholders, or justify investment in engineering improvements.

DORA (DevOps Research and Assessment) research has identified five key metrics that reliably predict software delivery and organizational performance. This playbook implements automated collection and visualization of these metrics within the Fawkes platform.

Risk Mitigation

Risk Impact Without Action How This Playbook Helps
Invisible bottlenecks Teams waste effort on wrong improvements Data reveals actual constraints
Unable to demonstrate improvement Stakeholders lose confidence in engineering Dashboards show measurable progress
Slow incident response Prolonged outages damage customer trust MTTR tracking drives faster recovery
High change failure rate Quality issues erode user satisfaction Early detection enables proactive fixes
High rework rate Teams patching silently rather than improving Rework tracking exposes fix-deploy cycles

Expected Outcomes

  • ✅ Automated collection of all five DORA metrics
  • ✅ Real-time dashboards showing current performance levels
  • ✅ Historical trend analysis for improvement tracking
  • ✅ Team-level breakdowns for targeted interventions
  • ✅ Alerts for performance degradation

Business Value

Metric Before After Improvement
Visibility into delivery performance None/Manual Automated, Real-time ∞ improvement
Time to identify bottlenecks Days/Weeks Minutes 90%+ reduction
Engineering productivity discussions Opinion-based Data-driven Qualitative shift
Stakeholder confidence Low High Measurable progress

II. Technical Prerequisites

Diátaxis: Reference

This section lists required Fawkes components, versions, and environment specifications.

Required Fawkes Components

Component Minimum Version Required Documentation
Kubernetes 1.28+ See Getting Started
Prometheus 2.47+ See Prometheus Tool
Grafana 10.2+ See Observability
Jenkins 2.426+ See Jenkins Tool
ArgoCD 2.9+ See GitOps Module
DevLake 0.19+ ⬜ Optional See DevLake ADR

Environment Requirements

# Minimum cluster resources for DORA metrics stack
nodes: 3
cpu_per_node: 4 cores
memory_per_node: 16 GB
storage: 50 GB (for metrics retention)

# Network requirements
ingress_controller: nginx or traefik
external_access: required for dashboards

Access Requirements

  • [ ] Cluster admin access to Kubernetes
  • [ ] Git repository webhook configuration rights
  • [ ] Jenkins admin access for plugin installation
  • [ ] ArgoCD admin access for webhook configuration

Pre-Implementation Checklist

  • [ ] CI/CD pipeline (Jenkins) is operational
  • [ ] GitOps (ArgoCD) is deployed and managing applications
  • [ ] Prometheus and Grafana are running in the cluster
  • [ ] At least one application is being deployed via GitOps
  • [ ] Stakeholder approval for metrics collection obtained

III. Implementation Steps

Diátaxis: How-to Guide (Core)

This is the core of the playbook—step-by-step procedures using Fawkes components.

Step 1: Configure Deployment Event Collection

Objective: Capture deployment events from ArgoCD to measure deployment frequency and lead time.

Estimated Time: 45 minutes

  1. Create the DORA metrics namespace:
kubectl create namespace dora-metrics
  1. Deploy the deployment event collector:
# dora-deployment-collector.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: dora-collector-config
  namespace: dora-metrics
data:
  config.yaml: |
    collectors:
      - type: argocd
        webhook_path: /webhooks/argocd
        metrics:
          - deployment_frequency
          - lead_time_for_changes
      - type: jenkins
        webhook_path: /webhooks/jenkins
        metrics:
          - build_time
          - pipeline_success_rate
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dora-collector
  namespace: dora-metrics
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dora-collector
  template:
    metadata:
      labels:
        app: dora-collector
    spec:
      containers:
        - name: collector
          image: fawkes/dora-collector:v1.0.0
          ports:
            - containerPort: 8080
          volumeMounts:
            - name: config
              mountPath: /etc/dora
      volumes:
        - name: config
          configMap:
            name: dora-collector-config
---
apiVersion: v1
kind: Service
metadata:
  name: dora-collector
  namespace: dora-metrics
spec:
  selector:
    app: dora-collector
  ports:
    - port: 8080
      targetPort: 8080
  1. Apply the configuration:
kubectl apply -f dora-deployment-collector.yaml

Verification: Check that the collector pod is running:

kubectl get pods -n dora-metrics -l app=dora-collector
Expected Output

NAME READY STATUS RESTARTS AGEE dora-collector-7d9f8b6c4f-x2k9j 1/1 Running 0 30s

Step 2: Configure ArgoCD Webhooks

Objective: Connect ArgoCD deployment events to the DORA collector.

Estimated Time: 30 minutes

  1. Get the DORA collector service endpoint:
kubectl get svc dora-collector -n dora-metrics -o jsonpath='{.spec.clusterIP}'
  1. Configure ArgoCD notifications to send deployment events:
# argocd-notifications-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-notifications-cm
  namespace: argocd
data:
  service.webhook.dora: |
    url: http://dora-collector.dora-metrics:8080/webhooks/argocd
    headers:
    - name: Content-Type
      value: application/json
  template.deployment-event: |
    webhook:
      dora:
        method: POST
        body: |
          {
            "application": "{{.app.metadata.name}}",
            "status": "{{.app.status.sync.status}}",
            "revision": "{{.app.status.sync.revision}}",
            "timestamp": "{{.app.status.operationState.finishedAt}}"
          }
  trigger.on-sync-succeeded: |
    - when: app.status.sync.status == 'Synced'
      send: [deployment-event]
  1. Apply the notification configuration:
kubectl apply -f argocd-notifications-cm.yaml

Verification: Trigger a deployment and check for webhook delivery.

Step 3: Configure Jenkins Pipeline Metrics

Objective: Capture build and pipeline metrics from Jenkins.

Estimated Time: 45 minutes

  1. Install the Prometheus metrics plugin in Jenkins:
// In Jenkins shared library
// vars/doraMetrics.groovy
def recordDeployment(Map config) {
    def startTime = config.startTime ?: currentBuild.startTimeInMillis
    def endTime = System.currentTimeMillis()
    def leadTime = endTime - startTime

    sh """
        curl -X POST http://dora-collector.dora-metrics:8080/metrics/deployment \\
            -H 'Content-Type: application/json' \\
            -d '{
                "service": "${config.service}",
                "environment": "${config.environment}",
                "commit_sha": "${config.commitSha}",
                "lead_time_ms": ${leadTime},
                "status": "${currentBuild.result ?: 'SUCCESS'}"
            }'
    """
}

def recordFailure(Map config) {
    sh """
        curl -X POST http://dora-collector.dora-metrics:8080/metrics/failure \\
            -H 'Content-Type: application/json' \\
            -d '{
                "service": "${config.service}",
                "environment": "${config.environment}",
                "type": "${config.type}",
                "detected_at": "${new Date().toInstant()}"
            }'
    """
}
  1. Update pipelines to emit DORA metrics:
// Example Jenkinsfile integration
@Library('fawkes-shared-library') _

pipeline {
    agent any
    stages {
        stage('Deploy') {
            steps {
                script {
                    // Deployment logic here
                    sh 'kubectl apply -f manifests/'

                    // Record deployment for DORA metrics
                    doraMetrics.recordDeployment(
                        service: 'my-service',
                        environment: 'production',
                        commitSha: env.GIT_COMMIT
                    )
                }
            }
        }
    }
    post {
        failure {
            script {
                doraMetrics.recordFailure(
                    service: 'my-service',
                    environment: 'production',
                    type: 'deployment_failure'
                )
            }
        }
    }
}

Verification: Run a pipeline and verify metrics are being collected.

Common Pitfall

Ensure the Jenkins service account has network access to the dora-collector service. If using network policies, create appropriate rules.

Step 4: Configure Incident Tracking for MTTR

Objective: Track incident detection and resolution for Mean Time to Restore measurement.

Estimated Time: 30 minutes

  1. Configure Prometheus alerting to record incidents:
# prometheus-dora-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: dora-incident-rules
  namespace: monitoring
spec:
  groups:
    - name: dora-incidents
      rules:
        - alert: ServiceDown
          expr: up{job=~".*production.*"} == 0
          for: 1m
          labels:
            severity: critical
            dora_incident: "true"
          annotations:
            summary: "Service {{ $labels.job }} is down"

        - alert: HighErrorRate
          expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
          for: 2m
          labels:
            severity: warning
            dora_incident: "true"
          annotations:
            summary: "High error rate for {{ $labels.service }}"
  1. Configure Alertmanager to notify DORA collector:
# alertmanager-dora-config.yaml
receivers:
  - name: dora-collector
    webhook_configs:
      - url: "http://dora-collector.dora-metrics:8080/webhooks/alertmanager"
        send_resolved: true

route:
  receiver: dora-collector
  routes:
    - match:
        dora_incident: "true"
      receiver: dora-collector

Verification: Trigger a test alert and verify it's recorded.

Step 5: Deploy DORA Metrics Dashboard

Objective: Create visualization dashboards for all four DORA metrics.

Estimated Time: 30 minutes

  1. Deploy the DORA Grafana dashboard:
# dora-dashboard-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: dora-dashboard
  namespace: monitoring
  labels:
    grafana_dashboard: "true"
data:
  dora-metrics.json: |
    {
      "title": "DORA Metrics Dashboard",
      "panels": [
        {
          "title": "Deployment Frequency",
          "type": "stat",
          "targets": [{
            "expr": "sum(increase(dora_deployments_total[7d]))"
          }]
        },
        {
          "title": "Lead Time for Changes",
          "type": "gauge",
          "targets": [{
            "expr": "avg(dora_lead_time_seconds)"
          }]
        },
        {
          "title": "Change Failure Rate",
          "type": "gauge",
          "targets": [{
            "expr": "sum(dora_deployment_failures_total) / sum(dora_deployments_total) * 100"
          }]
        },
        {
          "title": "Mean Time to Restore",
          "type": "gauge",
          "targets": [{
            "expr": "avg(dora_mttr_seconds)"
          }]
        }
      ]
    }
  1. Apply the dashboard:
kubectl apply -f dora-dashboard-configmap.yaml

Verification: Access Grafana and verify the DORA dashboard is visible.

Step 6: Configure Deployment Rework Rate Collection

Objective: Detect and record deployments that require a follow-up fix deployment within a configurable time window.

Estimated Time: 30 minutes

A deployment is counted as rework if a second deployment to the same service occurs within the rework window (default: 24 hours) and is tagged or flagged as a fix using any of the following signals:

  • Conventional commit prefix: fix:, hotfix:, or revert:
  • Deployment labelled dora.dev/rework: "true" in the ArgoCD Application spec
  • Pull request title or body contains [rework] or [hotfix]

  • Add the rework detection configuration to the DORA collector:

# dora-rework-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: dora-collector-config
  namespace: dora-metrics
data:
  config.yaml: |
    collectors:
      - type: argocd
        webhook_path: /webhooks/argocd
        metrics:
          - deployment_frequency
          - lead_time_for_changes
          - deployment_rework_rate
    rework_detection:
      window_hours: 24               # configurable rework window (default: 24 h)
      fix_commit_patterns:
        - "^fix:"
        - "^hotfix:"
        - "^revert:"
      fix_label: "dora.dev/rework"   # ArgoCD Application label to force-tag a rework
      fix_pr_patterns:
        - "\\[rework\\]"
        - "\\[hotfix\\]"
  1. Annotate fix deployments explicitly (optional — for non-conventional commits):
# In your ArgoCD Application spec, add the label to the next sync:
metadata:
  labels:
    dora.dev/rework: "true"
  1. Apply the updated configuration:
kubectl apply -f dora-rework-config.yaml
  1. Verify rework detection is active:
kubectl exec -n dora-metrics deploy/dora-collector -- \
  curl -s localhost:8080/metrics | grep dora_deployment_rework

Expected Result: dora_deployment_rework_total metric is present (value may be 0 until a rework deployment occurs).

Verification: Trigger a deployment followed by a fix: commit deployment within 24 hours and confirm the rework count increments.


IV. Validation & Success Metrics

Diátaxis: How-to Guide / Reference

Instructions for verifying the implementation and measuring success.

Functional Validation

Test 1: Deployment Frequency Collection

# Trigger a test deployment
kubectl set image deployment/test-app app=nginx:latest -n test

# Check metrics endpoint
kubectl exec -n dora-metrics deploy/dora-collector -- \
  curl -s localhost:8080/metrics | grep dora_deployments_total

Expected Result: Metric should show at least 1 deployment recorded.

Test 2: Lead Time Calculation

# Query Prometheus for lead time
kubectl exec -n monitoring deploy/prometheus -- \
  promtool query instant 'avg(dora_lead_time_seconds)'

Expected Result: Returns a valid duration in seconds.

Test 3: MTTR Recording

# Simulate an incident resolution
curl -X POST http://dora-collector.dora-metrics:8080/incidents/resolve \
  -d '{"incident_id": "test-123", "resolved_at": "'$(date -Iseconds)'"}'

# Verify MTTR metric
kubectl exec -n monitoring deploy/prometheus -- \
  promtool query instant 'dora_mttr_seconds'

Expected Result: MTTR metric is populated.

Success Metrics

Metric How to Measure Target Value Dashboard Link
Data Collection Check dora_* metrics exist All 5 metrics present /grafana/dora
Dashboard Load Grafana dashboard loads < 3 seconds /grafana/dora
Historical Data Query 7-day data Data available /grafana/dora

Verification Checklist

  • [ ] All five DORA metrics are being collected
  • [ ] Grafana dashboard displays correctly
  • [ ] Team-level filtering works
  • [ ] Historical trend data is accumulating
  • [ ] Alerts trigger correctly for metric degradation

DORA Metrics Impact

This playbook establishes the foundation for measuring DORA metrics. After 2-4 weeks of data collection, you'll be able to:

DORA Metric Initial Baseline Typical Elite Target
Deployment Frequency Measured Multiple per day
Lead Time for Changes Measured < 1 hour
Change Failure Rate Measured 0-15%
Time to Restore Measured < 1 hour
Deployment Rework Rate Measured < 5%

V. Client Presentation Talking Points

Diátaxis: Explanation / Conceptual

Ready-to-use business language for communicating success to client executives.

Executive Summary

We've implemented automated measurement of software delivery performance using the industry-standard DORA metrics framework. Your organization now has real-time visibility into deployment frequency, lead time, change failure rate, recovery time, and deployment rework rate—the five metrics that DORA 2025 research proves correlate with business performance. This data-driven foundation enables targeted improvements and demonstrates engineering progress to stakeholders.

Key Messages for Stakeholders

For Technical Leaders (CTO, VP Engineering)

  • "We've implemented automated DORA metrics collection that tracks all five key performance indicators across your delivery pipeline"
  • "This positions your organization to identify bottlenecks with data rather than intuition—teams can now see exactly where time is spent in the delivery process"
  • "Elite performers in the DORA research deploy on-demand, have lead times under one hour, fail less than 15% of the time, and recover in under one hour. You now have the data to benchmark and improve toward these targets."

For Business Leaders (CEO, CFO)

  • "This investment gives you visibility into engineering productivity for the first time—you'll see exactly how fast features move from idea to customer"
  • "DORA research proves that organizations with elite software delivery performance are 2x more likely to exceed their business goals. These metrics are your leading indicator."
  • "Faster, more reliable software delivery translates directly to faster time-to-market and reduced risk of outages that impact customers and revenue"

Demonstration Script

  1. Open: "Let me show you the DORA Metrics Dashboard in Grafana. This gives us real-time visibility into four key performance indicators..."

  2. Show Deployment Frequency: "This shows how often we're deploying to production. Elite performers deploy on-demand, multiple times per day. Our current rate is [X]..."

  3. Show Lead Time: "Lead time measures the time from when a developer commits code to when it's running in production. Elite performers achieve this in under one hour. We're currently at [X]..."

  4. Show Change Failure Rate: "This shows what percentage of our deployments cause problems. Elite teams are below 15%. We're at [X%]..."

  5. Show MTTR: "When something does go wrong, how quickly do we recover? Elite teams restore service in under an hour. Our current mean time to restore is [X]..."

  6. Show Deployment Rework Rate: "This is our rework rate—the percentage of deployments that needed a follow-up fix within 24 hours. A high rework rate often means silent patching instead of process improvement. We're currently at [X%], targeting below 5%..."

  7. Connect to value: "By tracking these five metrics, we can identify exactly where to focus improvement efforts and demonstrate progress to the business."

Common Executive Questions & Answers

How does this compare to industry benchmarks?

According to the 2023 State of DevOps Report from DORA, elite performers deploy on demand (often multiple times per day), have lead times under one hour, a change failure rate of 0-15%, and recover from failures in under one hour. Your current metrics place you in the [Elite/High/Medium/Low] performance category, which aligns with approximately [X%] of organizations studied.

What's the ROI on this implementation?

The primary ROI is in enabling data-driven improvement. Organizations that improve from Medium to Elite performance see 2x improvement in organizational performance goals according to DORA research. Additionally, by identifying bottlenecks, we typically see 20-30% improvement in developer productivity within the first quarter.

What's the risk if we don't maintain this?

Without continued attention, metrics data quality may degrade as systems change. We recommend quarterly reviews of data collection configuration as part of normal platform maintenance. The cost of maintenance is minimal compared to the value of continued visibility.

What's the next step after implementing metrics?

With baseline metrics established, the next step is to identify your primary bottleneck. Typically, this is either deployment frequency (solved by automation) or lead time (solved by pipeline optimization). We can run a focused improvement sprint targeting your biggest constraint.

Follow-Up Actions

Action Owner Timeline
Review baseline metrics after 2 weeks Engineering Lead +2 weeks
Identify primary improvement opportunity Platform Team +3 weeks
Begin targeted improvement playbook Consultant/Team +4 weeks
Schedule stakeholder review Consultant +6 weeks

Appendix

Troubleshooting

Issue Possible Cause Resolution
Metrics not appearing Webhook not configured Verify ArgoCD/Jenkins webhook settings
Dashboard empty Prometheus not scraping Check Prometheus targets and scrape config
Incorrect lead time Clock skew Ensure NTP sync across nodes
Missing deployments Network policy blocking Add network policy for dora-metrics namespace

Change Log

Date Version Changes
2024-01-15 1.0 Initial release