ADR-018: Developer Experience Measurement Framework (SPACE)
Status
Accepted
Context
The 2025 DORA Report identifies user-centric focus as the single most critical capability for successful AI adoption and high-performing teams. The research found with high certainty that:
“When teams adopt a user-centric focus, the positive influence of AI on their performance is amplified. Conversely, in the absence of a user-centric focus, AI adoption can have a negative impact on team performance.”
For Fawkes to succeed as an Internal Delivery Platform, we must treat developers as our users and continuously measure, understand, and improve their experience. Without systematic measurement of developer experience (DevEx), we risk building features that don’t solve real problems, implementing AI tools that create friction rather than value, and making platform decisions based on assumptions rather than data.
Current State:
- ❌ No systematic measurement of developer satisfaction
- ❌ No tracking of cognitive load or friction points
- ❌ No understanding of time spent on valuable vs. toil work
- ❌ No feedback loops from developers to platform team
- ❌ Platform decisions made on assumptions, not validated needs
- ❌ No baseline metrics to measure improvement over time
The Problem: Organizations often measure outputs (deployments, lines of code, tickets closed) but fail to measure the experience of the people doing the work. This leads to:
- Productivity theater (looking busy without delivering value)
- Burnout from measuring activity instead of outcomes
- Tools and processes that optimize metrics but harm humans
- Disconnect between platform team goals and developer needs
Industry Context: The SPACE framework, developed by researchers at GitHub, Microsoft, and University of Victoria, provides a holistic approach to measuring developer productivity and experience across five dimensions:
- Satisfaction: How fulfilled developers feel
- Performance: System and process outcomes
- Activity: Developer actions and outputs
- Communication & Collaboration: Team interaction quality
- Efficiency & Flow: Ability to complete work with minimal interruption
This framework has been validated in industry and aligns with DORA’s research on high-performing teams.
Decision
We will adopt the SPACE framework as our comprehensive Developer Experience measurement strategy for the Fawkes platform. This framework will guide what we measure, how we collect data, and how we act on insights to continuously improve the platform.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Developer Experience Measurement System │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ SATISFACTION (Self-Reported) │ │
│ │ - NPS surveys (quarterly) │ │
│ │ - Platform satisfaction ratings (5-point scale) │ │
│ │ - Recommendation likelihood │ │
│ │ - Well-being assessments │ │
│ │ - Job satisfaction │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ PERFORMANCE (System Metrics) │ │
│ │ - DORA 4 keys (deployment freq, lead time, CFR, MTTR) │ │
│ │ - Build success rate │ │
│ │ - Test coverage │ │
│ │ - Code review turnaround time │ │
│ │ - Incident response time │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ ACTIVITY (Behavioral Metrics) │ │
│ │ - Commits per developer │ │
│ │ - Pull requests opened/merged │ │
│ │ - Code review participation │ │
│ │ - Documentation contributions │ │
│ │ - Platform feature usage │ │
│ │ - AI tool adoption rates │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ COMMUNICATION & COLLABORATION (Interaction Quality) │ │
│ │ - Mattermost engagement metrics │ │
│ │ - Code review quality (comment depth, resolution time) │ │
│ │ - Documentation clarity ratings │ │
│ │ - Cross-team collaboration frequency │ │
│ │ - Knowledge sharing (wiki edits, blog posts) │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ EFFICIENCY & FLOW (Experience Quality) │ │
│ │ - Time spent in flow state (self-reported) │ │
│ │ - Cognitive load assessments │ │
│ │ - Context switching frequency │ │
│ │ - Percentage of time on valuable work │ │
│ │ - Friction incident logging │ │
│ │ - Interruption tracking │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ DevEx Dashboard (Grafana) │ │
│ │ - Real-time metrics visualization │ │
│ │ - Historical trend analysis │ │
│ │ - Team-level drill-downs │ │
│ │ - Alert on degrading metrics │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Implementation: Five SPACE Dimensions
1. SATISFACTION (How Happy Are Developers?)
What We’ll Measure:
- Net Promoter Score (NPS): “How likely are you to recommend Fawkes to a colleague?” (0-10 scale)
- Platform Satisfaction: “How satisfied are you with the Fawkes platform?” (1-5 scale)
- Feature Satisfaction: Rate specific features (Backstage, CI/CD, GitOps, etc.)
- Well-being: “I feel burned out from work” (strongly disagree to strongly agree)
- Job Satisfaction: “I find my work meaningful and fulfilling”
Collection Methods:
- Quarterly NPS surveys (5 minutes)
- Post-interaction micro-surveys (1 question after major platform actions)
- Annual comprehensive DevEx survey (15 minutes)
- In-platform feedback widget (always available)
Target Metrics:
- NPS >50 (good), >70 (excellent)
- Platform satisfaction >4.0/5.0
- <20% reporting burnout symptoms
Example Survey Question:
On a scale of 0-10, how likely are you to recommend
the Fawkes platform to a colleague?
0 = Not at all likely
10 = Extremely likely
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
Follow-up: What's the primary reason for your score?
[Text box]
2. PERFORMANCE (How Well Do Systems Work?)
What We’ll Measure:
- DORA Metrics: Deployment frequency, lead time, change failure rate, MTTR
- Build Metrics: Build success rate, build duration (P50, P95)
- Quality Metrics: Test coverage, security scan pass rate, code quality scores
- Reliability Metrics: Service uptime, incident count, MTTR
- Time-to-Value: Hours from onboarding to first deployment
Collection Methods:
- Automated telemetry from Jenkins, ArgoCD, GitHub
- Prometheus metrics collection
- Grafana dashboards with historical trends
- Automated alerting on threshold breaches
Target Metrics:
- Deployment frequency: >1/day
- Lead time for changes: <24 hours
- Change failure rate: <15%
- MTTR: <1 hour
- Build success rate: >95%
- Time to first deployment: <4 hours
Example Dashboard Panel:
┌─────────────────────────────────────────┐
│ DORA Metrics - Last 30 Days │
│ │
│ Deployment Frequency: 2.3/day ↑ +15% │
│ Lead Time: 18 hours ↓ -22% │
│ Change Failure Rate: 12% ↓ -3% │
│ MTTR: 47 minutes ↓ -18% │
│ │
│ [View Details] [Team Breakdown] │
└─────────────────────────────────────────┘
3. ACTIVITY (What Are Developers Doing?)
What We’ll Measure:
- Code Contributions: Commits, PRs, lines added/deleted
- Review Activity: PRs reviewed, comments per review, approval rate
- Platform Usage: Backstage views, dojo module completions, tool adoption
- AI Tool Usage: Copilot acceptance rate, AI-generated code percentage
- Documentation: Wiki edits, TechDocs updates, README improvements
- Learning: Dojo progress, certification completions, training attendance
Collection Methods:
- GitHub API for code metrics
- Backstage analytics
- Application logs and usage tracking
- AI tool telemetry (Copilot, RAG queries)
Target Metrics:
- 80%+ developers active on platform weekly
- Average 10+ commits per developer per week
- 90%+ PRs reviewed within 24 hours
- 70%+ AI tool adoption within 6 months
- 50%+ developers complete White Belt within 3 months
Warning: Activity metrics alone are dangerous. Never use these for:
- Individual performance reviews
- Ranking developers
- Setting quotas or minimums
- Punitive actions
These metrics show patterns, not individual worth. High activity ≠ high value.
4. COMMUNICATION & COLLABORATION (How Well Do Teams Work Together?)
What We’ll Measure:
- Code Review Quality: Average comments per PR, time to first review, resolution time
- Collaboration Patterns: Cross-team PRs, pair programming sessions, mob programming
- Knowledge Sharing: Mattermost channel activity, documentation contributions
- Feedback Quality: Constructive comment ratio, conflict resolution time
- Onboarding Support: Mentorship assignments, new developer success rate
Collection Methods:
- GitHub PR metadata analysis
- Mattermost analytics
- Backstage TechDocs engagement
- Manual tagging of collaboration events
- Onboarding survey feedback
Target Metrics:
- <12 hour average time to first code review
- 80% PRs with at least 1 constructive comment
- 60% developers actively helping in Mattermost
- <5% “toxic” or unconstructive review comments
- 90%+ new developers feel supported during onboarding
Example Metric:
Code Review Health Score: 87/100
✅ Fast response: 8 hours avg (target: <12)
✅ Thorough: 3.2 comments avg (target: >2)
⚠️ Approval rate: 92% (investigate rubber-stamping?)
✅ Constructive tone: 96% positive
5. EFFICIENCY & FLOW (Can Developers Focus and Deliver Value?)
What We’ll Measure:
- Flow State: “How often did you achieve deep focus?” (self-reported weekly)
- Valuable Work Time: “% of time spent on work you consider valuable” (weekly survey)
- Friction Incidents: Logging when tools/processes create blockers
- Context Switching: “How many different tools/tasks did you use today?”
- Cognitive Load: “Rate your mental effort today” (1-5 scale, daily pulse)
- Wait Time: Time spent blocked on external dependencies
Collection Methods:
- Weekly pulse surveys (2 minutes, 5 questions)
- Friction logging widget in Backstage (“Report a friction point”)
- Time tracking (optional, privacy-preserving)
- Meeting calendar analysis (% time in meetings)
Target Metrics:
- 60% time spent on valuable work
- 3 days per week achieving flow state
- <30 friction incidents per 100 developers per month
- Cognitive load average <3.5/5.0 (below “overwhelmed”)
- <25% time in meetings
Example Weekly Pulse Survey:
Quick Check-In (2 minutes)
1. This week, approximately what % of your time
was spent on work you found valuable?
[Slider: 0% - 100%]
2. How many times did you achieve "flow state"
(deep, focused work)?
[ ] Never [ ] 1-2 times [ ] 3-4 times [ ] 5+ times
3. Rate your cognitive load this week:
[ ] Very Low [ ] Low [ ] Moderate [ ] High [ ] Overwhelming
4. Did you experience any significant friction
using the platform?
[ ] No [ ] Yes → [Report details]
5. One thing to celebrate or improve?
[Optional text box]
Data Collection Infrastructure
Technology Stack:
- Surveys: Qualtrics or Typeform (quarterly NPS, annual DevEx)
- Pulse Surveys: Custom Backstage plugin (weekly 2-min check-in)
- Metrics Collection: Prometheus (system metrics)
- Dashboards: Grafana (DevEx Dashboard with SPACE dimensions)
- Feedback Widget: Backstage plugin (always-on feedback)
- Analytics: PostHog or Amplitude (product analytics)
- Data Warehouse: PostgreSQL (survey responses, aggregated metrics)
Data Pipeline:
Survey Tools → API → Data Warehouse (PostgreSQL)
↓
Platform Logs → Prometheus → Grafana DevEx Dashboard
↓
GitHub API → ETL → Data Warehouse
↓
Analysis & Reports
Privacy & Ethics
Privacy-First Principles:
- Individual data is never shared: Managers never see individual survey responses
- Aggregation threshold: Metrics only shown for groups of 5+ people
- Opt-in for detailed tracking: Time tracking, keystroke analytics always optional
- Anonymous feedback: Developers can always provide feedback anonymously
- Data retention limits: Survey responses deleted after 2 years
- Right to be forgotten: Developers can request data deletion at any time
Never Use DevEx Data For:
- ❌ Individual performance reviews
- ❌ Ranking developers
- ❌ Firing decisions
- ❌ Bonus calculations
- ❌ Comparing individuals
Always Use DevEx Data For:
- ✅ Identifying platform improvement opportunities
- ✅ Understanding team-level trends
- ✅ Measuring impact of platform changes
- ✅ Celebrating successes
- ✅ Advocating for developer needs to leadership
The DevEx Dashboard
Grafana Dashboard Structure (3 pages):
Page 1: Executive Summary
- Overall NPS score with trend
- DORA 4 keys summary
- Satisfaction score across dimensions
- Key alerts (metrics degrading)
Page 2: SPACE Deep Dive
- 5 panels (one per SPACE dimension)
- Historical trends (30/60/90 day views)
- Team-level breakdowns
- Correlation analysis (e.g., satisfaction vs. lead time)
Page 3: Action Dashboard
- Top 5 friction points (from feedback)
- Suggested improvements (from analysis)
- Experiment tracking (what we’re trying)
- Impact measurement (did changes help?)
Measurement Cadence
| Metric Type | Frequency | Duration | Purpose |
|---|---|---|---|
| NPS Survey | Quarterly | 5 min | Track overall satisfaction trend |
| DevEx Survey | Annual | 15 min | Comprehensive assessment |
| Weekly Pulse | Weekly | 2 min | Quick check-in, catch issues early |
| Friction Reports | Continuous | 1 min | Log blockers in real-time |
| DORA Metrics | Continuous | N/A | Automated system metrics |
| Platform Analytics | Continuous | N/A | Usage patterns, adoption |
Acting on Insights: The Feedback Loop
Monthly DevEx Review Meeting (Platform Team):
- Review dashboard (30 minutes)
- Identify top 3 issues (from friction reports, survey feedback)
- Prioritize improvements (impact vs. effort)
- Commit to 1-2 experiments for next month
- Measure impact in following month
Quarterly DevEx Report (to Leadership):
- NPS trend and key drivers
- DORA metrics performance
- Platform adoption metrics
- Top improvements delivered
- Planned focus areas for next quarter
Communicating Back to Developers:
- Monthly “You Said, We Did” post in Mattermost
- Quarterly DevEx town hall (results + roadmap)
- Always close the loop on feedback: “We heard X, here’s what we’re doing about it”
Consequences
Positive
- Data-Driven Decisions: Platform roadmap based on validated user needs, not assumptions
- Early Warning System: Degrading metrics alert us to problems before they become crises
- Demonstrate Value: Quantify platform impact for leadership (ROI, productivity gains)
- Continuous Improvement: Systematic process for getting better over time
- Developer Trust: Developers feel heard when feedback leads to action
- AI Readiness: User-centric foundation amplifies AI benefits (per DORA research)
- Attract Talent: High DevEx scores help recruit top engineers
- Reduce Turnover: Satisfied developers stay longer, reducing hiring costs
- Cultural Shift: Treating developers as users changes how we build platforms
- Benchmark Progress: Baselines enable “before/after” analysis of changes
Negative
- Survey Fatigue: Too many surveys can reduce response rates (mitigate with short, purposeful surveys)
- Privacy Concerns: Developers may worry about surveillance (mitigate with clear privacy policy)
- Overhead: Collecting, analyzing, and acting on data takes time (~20% of platform team)
- Expectation Management: Measuring creates expectation that we’ll act on findings
- Analysis Paralysis: Too much data can delay decisions (mitigate with monthly review cadence)
- Gaming Metrics: Teams may try to optimize metrics rather than outcomes (mitigate with education)
- Initial Low Scores: First measurements may reveal uncomfortable truths about current state
Neutral
- Requires Cultural Buy-In: Leadership must support user-centric approach
- Learning Curve: Platform team must develop research and analysis skills
- Ongoing Effort: DevEx measurement is a permanent practice, not a one-time project
Alternatives Considered
Alternative 1: No Formal Measurement (Status Quo)
Pros:
- Zero overhead
- No survey fatigue
- No privacy concerns
Cons:
- Decisions based on loudest voices or HiPPO (Highest Paid Person’s Opinion)
- No way to prove platform value
- Can’t measure improvement over time
- Miss early warning signs of problems
Reason for Rejection: DORA research conclusively shows user-centric focus amplifies AI benefits and team performance. Without measurement, we can’t be user-centric.
Alternative 2: DORA Metrics Only
Pros:
- Well-established framework
- Automated data collection
- Proven correlation with outcomes
Cons:
- Only measures system performance, not human experience
- Misses satisfaction, cognitive load, friction
- Can create perverse incentives if used alone
- Doesn’t capture communication/collaboration quality
Reason for Rejection: DORA metrics are necessary but insufficient. SPACE framework encompasses DORA metrics (Performance dimension) plus human factors.
Alternative 3: Custom Metrics Framework
Pros:
- Tailored precisely to Fawkes needs
- No need to learn existing framework
Cons:
- Reinventing the wheel
- No industry benchmarks for comparison
- Lacks research validation
- Hard to explain to stakeholders
Reason for Rejection: SPACE is well-researched, industry-validated, and comprehensive. Better to adopt proven framework than create our own.
Alternative 4: Simple NPS Only
Pros:
- Very simple to implement
- Low survey burden
- Easy to track over time
Cons:
- NPS tells you “what” (satisfaction level) but not “why”
- No diagnostic capability
- Can’t identify specific improvement areas
- Misses entire dimensions (activity, flow, collaboration)
Reason for Rejection: NPS is a great summary metric but insufficient for driving improvements. We need diagnostic metrics to understand what to fix.
Alternative 5: DevEx Framework (DX Core 4)
Pros:
- Focused specifically on developer experience
- Simpler than SPACE (4 dimensions vs. 5)
- Good research backing
Cons:
- Less comprehensive than SPACE
- Newer framework (less industry adoption)
- Doesn’t explicitly include collaboration dimension
Reason for Rejection: SPACE is more comprehensive and has broader industry adoption. DevEx framework is good but SPACE is better established.
Implementation Plan
Phase 1: Foundation (Weeks 1-2)
Week 1: Infrastructure Setup
- Deploy survey tools (Qualtrics/Typeform account)
- Create data warehouse schema in PostgreSQL
- Design Grafana DevEx dashboard (mockup)
- Draft privacy policy and data handling procedures
- Write initial NPS survey (5 questions)
Week 2: Baseline Measurement
- Launch first NPS survey to all developers
- Collect DORA metrics baseline (automated)
- Deploy friction reporting widget in Backstage
- Analyze NPS results and identify themes
- Set initial targets for each SPACE dimension
Phase 2: Full Rollout (Weeks 3-4)
Week 3: Automated Metrics
- Implement Prometheus collectors for activity metrics
- Build Grafana dashboards (all 5 SPACE dimensions)
- Set up alerting for degrading metrics
- Document data collection infrastructure
- Train platform team on dashboard usage
Week 4: Feedback Loops
- Deploy weekly pulse survey (automated in Backstage)
- Create feedback response process (monthly review meetings)
- Launch first “You Said, We Did” communication
- Schedule quarterly DevEx review with leadership
- Establish monthly user interview schedule (5 devs/month)
Phase 3: Iteration (Month 2+)
- Conduct first monthly DevEx review meeting
- Implement 1-2 improvements based on feedback
- Measure impact of changes
- Refine metrics and surveys based on learnings
- Build momentum: celebrate wins, share results
Metrics for Success
Adoption Metrics (First 3 Months):
- 70%+ response rate on NPS surveys
- 50%+ response rate on weekly pulse surveys
- 20+ friction reports submitted per month
- 100% platform team trained on dashboard usage
- Monthly DevEx review meetings established
Outcome Metrics (First 6 Months):
- NPS >50 (baseline + improvement)
- 60% developers report time on valuable work
- <30 friction incidents per 100 developers per month
- DORA metrics trending upward
- 3+ platform improvements delivered from feedback
Long-Term Metrics (12+ Months):
- NPS >60 (elite performer territory)
- Deployment frequency >2/day
- Lead time <12 hours
- <10% developers reporting burnout
- Platform team can articulate clear ROI with data
Related Decisions
- ADR-002: Backstage for Developer Portal (primary vehicle for surveys, feedback widget)
- ADR-006: Prometheus for Metrics (Performance and Activity dimension data collection)
- ADR-015: User Research & Feedback System (complements quantitative metrics with qualitative insights)
- ADR-016: Platform-as-Product Operating Model (DevEx metrics inform product roadmap)
References
- SPACE Framework Paper: https://queue.acm.org/detail.cfm?id=3454124
- 2025 DORA Report: https://dora.dev/dora-report-2025
- DevEx Framework: https://queue.acm.org/detail.cfm?id=3595878
- GitHub Octoverse: Developer productivity research
- Accelerate (Book): Forsgren, Humble, Kim - DORA metrics foundation
- Team Topologies (Book): Skelton & Pais - Cognitive load research
Notes
Key Insight from DORA 2025:
“We found with a high degree of certainty that when teams adopt a user-centric focus, the positive influence of AI on their performance is amplified. Conversely, in the absence of a user-centric focus, AI adoption can have a negative impact on team performance.”
Translation for Fawkes:
- DevEx measurement is not optional—it’s the foundation for AI success
- Without measuring developer experience, AI adoption may harm performance
- SPACE framework provides the comprehensive measurement system we need
- Measurement without action is worthless—commit to monthly improvements
Cultural Note: Implementing DevEx measurement is a cultural transformation as much as a technical one. The platform team must genuinely care about developer experience and be willing to change based on feedback. If leadership views this as “just more metrics,” it will fail.
Last Updated
December 7, 2024 - Initial version documenting SPACE framework adoption for Fawkes DevEx measurement