View DORA Metrics in DevLake
Goal
Access and analyze DORA (DevOps Research and Assessment) metrics for your team to measure deployment frequency, lead time for changes, change failure rate, and mean time to recovery.
Prerequisites
Before you begin, ensure you have:
- [ ] DevLake deployed and configured (see DORA Metrics Implementation)
- [ ] Your project connected to DevLake data sources (Git, Jenkins, ArgoCD)
- [ ] Access to Grafana UI (
https://grafana.127.0.0.1.nip.io) - [ ] At least 7 days of deployment data for meaningful metrics
Steps
1. Access Grafana Dashboards
Open Grafana
Navigate to Grafana:
# Get Grafana URL
echo "https://grafana.127.0.0.1.nip.io"
# Get admin password (if needed)
kubectl -n monitoring get secret grafana-admin \
-o jsonpath="{.data.admin-password}" | base64 -d
Log in with your credentials.
2. Locate DORA Metrics Dashboard
Navigate to DORA Dashboard
- In Grafana, click Dashboards (four squares icon) in the left sidebar
- Search for "DORA Metrics" or "DevLake"
- Click on the "DORA Metrics Overview" dashboard
Common DORA dashboards:
- DORA Metrics Overview: All four metrics in one view
- Deployment Frequency: Detailed deployment trends
- Lead Time Analysis: Commit to deployment timeline
- Change Failure Rate: Failed vs. successful deployments
- MTTR Dashboard: Incident response times
3. Select Your Team/Project
Filter by Team
Most DORA dashboards include filters at the top:
- Team/Project Dropdown: Select your team or project
- Time Range: Choose the period to analyze (default: Last 90 days)
- Environment: Filter by dev/staging/production
Example filters:
Team: Payment Squad
Project: payment-service
Environment: production
Time Range: Last 90 days
4. Analyze the Four DORA Metrics
Metric 1: Deployment Frequency
What it measures: How often your team deploys code to production.
Dashboard panels:
- Deployments per Day/Week: Bar chart showing deployment count over time
- Deployment Trend: Line graph showing frequency trend
- Team Comparison: Compare your team's frequency to organization average
Interpretation:
| Performance Level | Deployment Frequency |
|---|---|
| Elite | Multiple deploys per day |
| High | Between once per day and once per week |
| Medium | Between once per week and once per month |
| Low | Fewer than once per month |
Example view:
┌─────────────────────────────────────────────────────┐
│ Deployment Frequency │
│ │
│ This Week: 23 deployments (3.3 per day) │
│ Last Week: 19 deployments (2.7 per day) │
│ │
│ Performance Level: Elite ⭐ │
│ │
│ [Bar Chart: Deployments by Day] │
│ Mon Tue Wed Thu Fri Sat Sun │
│ ▓▓ ▓▓▓ ▓▓▓ ▓▓ ▓▓ ▓ ▓ │
│ 4 5 5 4 4 1 0 │
└─────────────────────────────────────────────────────┘
Metric 2: Lead Time for Changes
What it measures: Time from code commit to production deployment.
Dashboard panels:
- Average Lead Time: Single stat showing mean lead time
- Lead Time Distribution: Histogram showing lead time spread
- Lead Time Trend: Line graph showing trend over time
- Lead Time by Stage: Breakdown by commit→build→test→deploy
Interpretation:
| Performance Level | Lead Time for Changes |
|---|---|
| Elite | Less than one hour |
| High | Between one day and one week |
| Medium | Between one week and one month |
| Low | More than one month |
Example view:
┌─────────────────────────────────────────────────────┐
│ Lead Time for Changes │
│ │
│ Average: 45 minutes │
│ P50 (Median): 32 minutes │
│ P95: 2 hours 15 minutes │
│ │
│ Performance Level: Elite ⭐ │
│ │
│ Breakdown by Stage: │
│ Commit → Build: 8 min ▓▓ │
│ Build → Test: 12 min ▓▓▓ │
│ Test → Deploy: 15 min ▓▓▓▓ │
│ Deploy → Verify: 10 min ▓▓▓ │
└─────────────────────────────────────────────────────┘
Metric 3: Change Failure Rate
What it measures: Percentage of deployments that result in failure requiring remediation.
Dashboard panels:
- Change Failure Rate %: Single stat showing failure percentage
- Failed vs. Successful Deployments: Pie chart
- Failure Trend: Line graph showing failure rate over time
- Top Failure Causes: Table of most common failure reasons
Interpretation:
| Performance Level | Change Failure Rate |
|---|---|
| Elite | 0-15% |
| High | 16-30% |
| Medium | 31-45% |
| Low | More than 45% |
Example view:
┌─────────────────────────────────────────────────────┐
│ Change Failure Rate │
│ │
│ Last 30 days: 8.5% │
│ │
│ Performance Level: Elite ⭐ │
│ │
│ Failed: 7 deployments ▓ │
│ Successful: 75 deployments ▓▓▓▓▓▓▓▓▓▓▓ │
│ │
│ Top Failure Causes: │
│ 1. Database migration error (3) │
│ 2. Configuration missing (2) │
│ 3. Image pull error (2) │
└─────────────────────────────────────────────────────┘
Metric 4: Mean Time to Recovery (MTTR)
What it measures: Time to restore service after a production incident.
Dashboard panels:
- Average MTTR: Single stat showing mean recovery time
- MTTR Distribution: Histogram showing recovery time spread
- MTTR Trend: Line graph showing trend over time
- Recent Incidents: Table of recent incidents with recovery times
Interpretation:
| Performance Level | Mean Time to Recovery |
|---|---|
| Elite | Less than one hour |
| High | Less than one day |
| Medium | Between one day and one week |
| Low | More than one week |
Example view:
┌─────────────────────────────────────────────────────┐
│ Mean Time to Recovery (MTTR) │
│ │
│ Average: 28 minutes │
│ P50 (Median): 18 minutes │
│ P95: 1 hour 45 minutes │
│ │
│ Performance Level: Elite ⭐ │
│ │
│ Recent Incidents: │
│ Dec 5: API timeout MTTR: 15 min │
│ Dec 3: Database connection MTTR: 42 min │
│ Dec 1: Memory leak MTTR: 1h 30min │
└─────────────────────────────────────────────────────┘
5. Drill Down into Specific Metrics
View Deployment Details
Click on any data point in the charts to drill down:
- Click on a deployment bar in the Deployment Frequency chart
- View deployment details:
- Commit SHA and message
- Author and timestamp
- Build duration
- Deploy duration
- Success/failure status
Trace Lead Time Breakdown
Click on a lead time data point to see:
- Commit timestamp
- Build start/end time
- Test execution time
- Deployment start/end time
- Link to ArgoCD application
- Link to Jenkins pipeline
Investigate Failed Deployments
Click on a failed deployment to view:
- Error message and logs
- Failed stage (build/test/deploy)
- Rollback timestamp
- Time to detection
- Time to resolution
- Link to incident ticket
6. Compare Team Performance
View Team Comparison Dashboard
Navigate to the Team Comparison dashboard:
- Click Dashboards → DORA Team Comparison
- View metrics across all teams:
- Deployment frequency by team
- Lead time comparison
- Failure rate comparison
- MTTR comparison
Identify Best Practices
Look for high-performing teams and investigate:
- What tools do they use?
- What practices enable fast deployment?
- How do they maintain low failure rates?
- What's their secret to fast recovery?
7. Export and Share Metrics
Share Dashboard
- Click the Share icon (arrow) at the top of the dashboard
- Choose sharing option:
- Link: Copy URL to share with team
- Snapshot: Create a public snapshot
- Export: Download as JSON
- Email: Schedule email report
Generate Report
Create a PDF report:
- Set the desired time range and filters
- Click Share → Render PDF
- Download and share with stakeholders
Schedule Weekly Report
Configure automated reports:
- Click Dashboards → Dashboard Settings
- Select Reporting tab
- Click Schedule Report
- Configure:
- Frequency: Weekly (Monday 9 AM)
- Recipients: team@example.com
- Format: PDF
- Time range: Last 7 days
Verification
1. Verify Data Accuracy
Check that metrics reflect recent deployments:
# List recent deployments from ArgoCD
kubectl get applications -n argocd \
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.sync.revision}{"\t"}{.status.operationState.finishedAt}{"\n"}{end}'
# Compare with dashboard deployment count
# Numbers should match
2. Verify Lead Time Calculation
Manually verify a recent deployment:
# Get commit timestamp
git log -1 --format="%ai" <commit-sha>
# Get deployment timestamp from ArgoCD
argocd app get my-service-prod | grep "Sync Status"
# Calculate lead time manually
# Should match DevLake dashboard
3. Check Data Freshness
Ensure metrics are up to date:
- Check Last Update timestamp in dashboard header
- Should be within the last 5-15 minutes
- If stale, check DevLake data collection:
# Check DevLake pods
kubectl get pods -n devlake
# Check data collection logs
kubectl logs -n devlake -l app=devlake-collector --tail=50
Understanding Performance Levels
DORA Performance Tiers
Based on the State of DevOps Report, teams are categorized as:
| Tier | Deployment Frequency | Lead Time | Change Failure Rate | MTTR |
|---|---|---|---|---|
| Elite | Multiple/day | < 1 hour | 0-15% | < 1 hour |
| High | 1/day - 1/week | 1 day - 1 week | 16-30% | < 1 day |
| Medium | 1/week - 1/month | 1 week - 1 month | 31-45% | 1 day - 1 week |
| Low | < 1/month | > 1 month | > 45% | > 1 week |
Improvement Recommendations
If your team is not at Elite level, focus on:
To improve Deployment Frequency:
- Reduce batch size (smaller, more frequent deploys)
- Automate deployment pipeline
- Enable feature flags for gradual rollout
- Remove manual approval gates
To reduce Lead Time:
- Parallelize build and test stages
- Optimize test suite (remove slow/flaky tests)
- Use trunk-based development
- Automate code review checks
To reduce Change Failure Rate:
- Increase test coverage (unit, integration, E2E)
- Implement canary deployments
- Add smoke tests after deployment
- Improve monitoring and alerting
To reduce MTTR:
- Improve observability (logs, metrics, traces)
- Automate rollback procedures
- Practice incident response drills
- Maintain runbooks for common issues
Troubleshooting
Dashboard Shows No Data
Cause: DevLake not connected to data sources or no deployments in time range.
Solution:
# Check DevLake data source connections
kubectl get configmap -n devlake devlake-config -o yaml
# Verify Jenkins connection
curl -u admin:password http://jenkins.127.0.0.1.nip.io/api/json
# Verify ArgoCD connection
argocd app list
# Check time range in dashboard (expand to 30 or 90 days)
Metrics Seem Inaccurate
Cause: Incorrect deployment detection or missing metadata.
Solution:
- Verify ArgoCD Applications have proper labels:
kubectl get application -n argocd -o yaml | grep -A 5 labels - Ensure Jenkins pipelines emit DORA events
- Check DevLake collection logs for errors:
kubectl logs -n devlake -l app=devlake-collector | grep ERROR
Lead Time Shows Extremely High Values
Cause: Old commits being deployed or incorrect timestamp parsing.
Solution:
- Filter out outliers in dashboard (use percentiles instead of mean)
- Verify commit timestamps match deployment timestamps
- Check for long-lived branches (use trunk-based development)
Next Steps
After viewing DORA metrics:
- Trace Requests with Tempo - Debug slow deployments
- Onboard Service to ArgoCD - Improve deployment frequency
- Configure Ingress TLS - Secure production deployments
- DORA Metrics Playbook - Deep dive into implementation
Related Documentation
- DORA Metrics Implementation Playbook - Setup guide
- Continuous Delivery Pattern - Best practices
- Architecture Overview - Platform architecture
- 2023 State of DevOps Report - Research foundation