Skip to content

Module 17: Platform as a Product

Belt Level: ⚫ Black Belt Duration: 60 minutes Prerequisites: Modules 1-16, especially Module 2 (DORA Metrics) Certification Track: Fawkes Platform Architect


🎯 Learning Objectives

By the end of this module, you will be able to:

  1. Apply product management principles to internal developer platforms
  2. Conduct user research to understand developer needs and pain points
  3. Build a platform roadmap driven by user feedback and business value
  4. Measure platform adoption, satisfaction, and impact using key metrics
  5. Establish feedback loops and customer success practices for internal platforms

📚 Theory: Your Platform is a Product

The Platform as a Product Mindset

Traditional IT thinking:

  • "We build infrastructure, developers must use it"
  • Success = Infrastructure availability (99.9% uptime)
  • Mandate adoption through policy
  • One-size-fits-all solutions

Platform as a Product thinking:

  • "We serve developers, they are our customers"
  • Success = Developer satisfaction + business outcomes
  • Earn adoption through superior experience
  • Tailored solutions for different user personas

Why This Matters

The "Build It and They Will Come" Fallacy:

Many platform teams build technically excellent platforms that nobody uses:

  • ❌ Kubernetes cluster set up perfectly, but developers still deploy to VMs
  • ❌ CI/CD pipelines available, but teams continue using manual processes
  • ❌ Observability stack deployed, but no one looks at the dashboards

Root cause: Building for technical excellence without understanding user needs.

The Platform Product Triad

┌─────────────────────────────────────────────────────────────┐
│              PLATFORM AS A PRODUCT TRIAD                    │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│                    ┌──────────────┐                         │
│                    │   DESIRABLE  │                         │
│                    │  Do users     │                         │
│                    │  want it?     │                         │
│                    └───────┬──────┘                         │
│                            │                                 │
│              ┌─────────────┼─────────────┐                  │
│              │                           │                  │
│              │                           │                  │
│     ┌────────▼────────┐         ┌───────▼────────┐         │
│     │    FEASIBLE     │         │     VIABLE     │         │
│     │  Can we build   │         │  Does it drive │         │
│     │  it reliably?   │         │  business      │         │
│     │                 │         │  value?        │         │
│     └─────────────────┘         └────────────────┘         │
│                                                             │
│  SWEET SPOT: All three overlap                             │
│  - Developers want it (adoption)                           │
│  - We can build/maintain it (technical feasibility)        │
│  - It improves business metrics (DORA, cost, velocity)     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Your Platform's "Customers"

Unlike external products, your customers are internal:

┌──────────────────────────────────────────────────────────────┐
│                    USER PERSONAS                             │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  PERSONA 1: Frontend Developer (Alex)                       │
│  ├─ Needs: Fast iteration, preview environments             │
│  ├─ Pain: Complex deployment process, no staging            │
│  ├─ Skills: React/Vue, basic Docker, no Kubernetes          │
│  └─ Success: Can deploy feature in <5 minutes               │
│                                                              │
│  PERSONA 2: Backend Engineer (Jordan)                       │
│  ├─ Needs: Database migrations, service mesh                │
│  ├─ Pain: Manual DB changes, no service discovery           │
│  ├─ Skills: Java/Python, SQL, intermediate Kubernetes       │
│  └─ Success: Zero-downtime deployments with DB changes      │
│                                                              │
│  PERSONA 3: Data Scientist (Sam)                            │
│  ├─ Needs: GPU resources, Jupyter notebooks, data access    │
│  ├─ Pain: No ML infrastructure, manual model deployment     │
│  ├─ Skills: Python/R, ML frameworks, zero DevOps            │
│  └─ Success: Train and deploy models without ops team       │
│                                                              │
│  PERSONA 4: SRE/DevOps (Morgan)                             │
│  ├─ Needs: Observability, incident response tools           │
│  ├─ Pain: Alert fatigue, no runbooks                        │
│  ├─ Skills: Expert Kubernetes, Terraform, monitoring        │
│  └─ Success: MTTR < 5 minutes, no 3am pages                 │
│                                                              │
│  PERSONA 5: Engineering Manager (Taylor)                    │
│  ├─ Needs: Team velocity metrics, cost visibility           │
│  ├─ Pain: No visibility into bottlenecks, surprise bills    │
│  ├─ Skills: Technical background, business focus            │
│  └─ Success: Data-driven decisions, predictable costs       │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Key insight: Different personas have different needs. A one-size-fits-all platform will satisfy no one.


🔍 User Research for Platforms

Discovery: Understanding the Problem

Methods for platform user research:

1. User Interviews (Most valuable)

INTERVIEW SCRIPT TEMPLATE:

Opening (5 min):
- Thank you for your time
- We're improving the platform based on developer feedback
- No wrong answers, honest feedback helps us most
- Will take 30 minutes

Current Workflow (10 min):
- Walk me through how you deployed your last feature
- What tools did you use?
- Where did you get stuck?
- How long did the whole process take?

Pain Points (10 min):
- What's the most frustrating part of your deployment process?
- If you could change one thing, what would it be?
- What takes longer than it should?
- What do you work around or hack together?

Desired Future (5 min):
- If I could wave a magic wand, what would your ideal workflow be?
- What would success look like?
- How would you measure improvement?

Closing:
- Can I follow up if we need clarification?
- Would you be willing to test early versions?

Pro tips:

  • Ask "How?" and "Why?" not "Would you use...?"
  • Observe actual behavior, not stated preferences
  • Look for workarounds (reveals unmet needs)
  • Interview both happy and unhappy users

2. Shadowing / Observation

Sit with developers and watch them work:

  • Where do they wait?
  • What do they Google?
  • What tools do they switch between?
  • Where do they ask for help?

Example insights:

  • "They spent 20 minutes figuring out environment variable syntax"
  • "They copy-pasted from another team's repo instead of using docs"
  • "They waited 15 minutes for CI pipeline, then force-pushed to debug"

3. Surveys (Quantitative validation)

Use after interviews to validate at scale:

PLATFORM SATISFACTION SURVEY (NPS-style):

1. How likely are you to recommend our platform to a colleague? (0-10)

2. What is the PRIMARY reason for your score?
   [Open text field]

3. How often do you deploy to production?
   ○ Multiple times per day
   ○ Daily
   ○ Weekly
   ○ Monthly or less

4. How satisfied are you with the following? (1-5 scale)
   - Deployment speed
   - Documentation quality
   - Getting help when stuck
   - Observability/debugging
   - Local development experience

5. What would make the biggest positive impact on your productivity?
   [Open text field]

4. Analytics / Telemetry

Instrument your platform to observe usage:

  • Which features are used most/least?
  • Where do users drop off?
  • How long do tasks take?
  • What errors do they hit?
# Example: Track platform usage
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payment-service
  annotations:
    analytics/deploy-frequency: "5.2/day"
    analytics/avg-deploy-time: "8m 32s"
    analytics/rollback-rate: "2.1%"
    analytics/support-tickets: "3/month"

Synthesizing Research

Turn insights into themes:

RAW FEEDBACK (from 15 interviews):

"Deployments are slow" (8 mentions)
"I don't know if my deploy worked" (12 mentions)
"Kubernetes YAML is confusing" (6 mentions)
"I waste time waiting for CI" (7 mentions)
"Can't debug production issues" (10 mentions)

↓ Synthesize into themes ↓

THEME 1: Lack of visibility (12 mentions)
- No real-time deploy status
- Can't see what's running in production
- No easy way to check logs/metrics

THEME 2: Slow feedback loops (8 mentions)
- Deployments take >10 minutes
- CI pipelines are slow
- No local development that matches prod

THEME 3: Steep learning curve (6 mentions)
- Kubernetes concepts are hard
- Too much YAML configuration
- Documentation assumes too much knowledge

Prioritize using Impact vs Effort:

┌────────────────────────────────────────────────────────┐
│           IMPACT vs EFFORT MATRIX                      │
├────────────────────────────────────────────────────────┤
│                                                        │
│  HIGH IMPACT                                           │
│    │                                                   │
│    │   [Deployment Status]    [Observability]         │
│    │   📊 DO FIRST            📊 DO NEXT              │
│    │                                                   │
│    │                                                   │
│    │   [Better Docs]          [Local Dev]             │
│    │   📝 QUICK WINS          ⚙️  PLAN FOR            │
│    │                                                   │
│  LOW IMPACT                                            │
│    └─────────────────────────────────────────▶        │
│         LOW EFFORT              HIGH EFFORT            │
│                                                        │
└────────────────────────────────────────────────────────┘

NEXT SPRINT:
1. Deployment status dashboard (high impact, medium effort)
2. Improve documentation (medium impact, low effort)

📋 Building a Platform Roadmap

Product Roadmap Structure

┌──────────────────────────────────────────────────────────────┐
│              PLATFORM ROADMAP (Q1-Q4 2025)                   │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  NORTH STAR: Reduce deployment time from 30min to <5min     │
│                                                              │
│  Q1: VISIBILITY & FEEDBACK                                  │
│  ├─ Deployment status dashboard (Backstage plugin)          │
│  ├─ Real-time logs in UI                                    │
│  ├─ Slack notifications for deploy events                   │
│  └─ Metrics: 80% developers check dashboard weekly          │
│                                                              │
│  Q2: SPEED & RELIABILITY                                     │
│  ├─ Progressive delivery (canary deployments)               │
│  ├─ Parallel CI pipelines (5min → 2min)                     │
│  ├─ Auto-rollback on errors                                 │
│  └─ Metrics: Deploy time P95 < 8 minutes                    │
│                                                              │
│  Q3: DEVELOPER EXPERIENCE                                    │
│  ├─ Self-service preview environments                       │
│  ├─ Local development with Tilt                             │
│  ├─ Golden path templates for common patterns               │
│  └─ Metrics: 60% of teams using preview envs                │
│                                                              │
│  Q4: SCALE & OPTIMIZATION                                    │
│  ├─ Cost optimization dashboard                             │
│  ├─ Auto-scaling for production workloads                   │
│  ├─ Multi-region deployments                                │
│  └─ Metrics: 25% cost reduction, 99.9% availability         │
│                                                              │
│  CONTINUOUS:                                                 │
│  ├─ Weekly office hours                                     │
│  ├─ Monthly user interviews (5 developers)                  │
│  ├─ Quarterly satisfaction surveys (NPS)                    │
│  └─ Backstage documentation updates                         │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Roadmap Principles

  1. Themes, not features: Organize by user goals, not technical tasks
  2. Timeboxed: Quarterly or bi-weekly sprints, not "when it's done"
  3. Outcome-driven: Each item has success metrics
  4. Communicated: Public roadmap visible to all developers
  5. Flexible: Re-prioritize based on feedback

Saying No (Productively)

Not every request makes the roadmap:

REQUEST: "Can we add support for Terraform 1.6?"

BAD RESPONSE:
"No, we're too busy."

GOOD RESPONSE:
"Thanks for the request! We track all feedback. Currently we're
focused on reducing deploy times (our #1 pain point from user
research). Terraform 1.6 affects ~5 teams, while deploy speed
affects all 40 teams. We've added your request to the backlog
and will revisit in Q3. Does that work for you?"

EVEN BETTER:
"Let's understand the need. What's the use case for 1.6?
[Discussion reveals they just need a specific provider version]
Oh! We can enable that without upgrading Terraform core.
Can you test this next week?"

📊 Platform Metrics & KPIs

Metrics Pyramid

┌────────────────────────────────────────────────────────┐
│                  BUSINESS OUTCOMES                     │
│  ┌──────────────────────────────────────────────┐     │
│  │  - Revenue impact                             │     │
│  │  - Time to market                             │     │
│  │  - Engineering cost per deploy                │     │
│  └──────────────────────────────────────────────┘     │
│                        ▲                               │
│                        │                               │
│              ┌─────────┴─────────┐                     │
│              │   DORA METRICS    │                     │
│              │ - Deploy frequency│                     │
│              │ - Lead time       │                     │
│              │ - MTTR            │                     │
│              │ - Change fail rate│                     │
│              └─────────┬─────────┘                     │
│                        ▲                               │
│                        │                               │
│        ┌───────────────┴───────────────┐               │
│        │   PLATFORM ADOPTION           │               │
│        │ - Active users                │               │
│        │ - Usage frequency             │               │
│        │ - Feature adoption            │               │
│        └───────────────┬───────────────┘               │
│                        ▲                               │
│                        │                               │
│  ┌─────────────────────┴─────────────────────────┐    │
│  │         USER SATISFACTION                      │    │
│  │  - NPS (Net Promoter Score)                   │    │
│  │  - Support ticket volume                      │    │
│  │  - Documentation clarity rating               │    │
│  └───────────────────────────────────────────────┘    │
│                                                        │
└────────────────────────────────────────────────────────┘

Start measuring from bottom up:
1. Are users satisfied? (surveys, interviews)
2. Are they adopting? (usage analytics)
3. Is it improving DORA? (deployment metrics)
4. Is it driving business value? (cost, velocity)

Key Platform Metrics

1. Adoption Metrics

METRIC: Platform Adoption Rate
Formula: (Teams using platform / Total teams) × 100%

Targets:
- Month 1: 10% (early adopters)
- Month 3: 30% (early majority)
- Month 6: 60% (late majority)
- Month 12: 80%+ (full adoption)

Track by feature:
- CI/CD pipeline: 75% adoption
- GitOps deployment: 60% adoption
- Observability: 45% adoption
- Preview environments: 25% adoption

2. Satisfaction Metrics (NPS)

Net Promoter Score: "How likely would you recommend our
platform to a colleague?" (0-10)

Calculation:
- Promoters (9-10): Enthusiastic users
- Passives (7-8): Satisfied but not advocates
- Detractors (0-6): Unhappy, at risk

NPS = % Promoters - % Detractors

Benchmark:
- NPS > 50: Excellent
- NPS 30-50: Good
- NPS 0-30: Needs improvement
- NPS < 0: Crisis mode

3. Efficiency Metrics

METRIC: Time to First Deployment
Track: How long from "I want to deploy" to "It's in production"

Baseline (no platform): 4 hours
- Request infrastructure: 2 hours
- Manual setup: 1 hour
- Deploy + verify: 1 hour

Target (with platform): 15 minutes
- Self-service: 2 minutes
- Auto-deploy via Git push: 8 minutes
- Auto-verify health: 5 minutes

Impact: 93% reduction in time to deploy

4. Support & Reliability Metrics

METRIC: Mean Time to Resolution (Support)
Track: How fast can developers unblock themselves?

Support ticket categories:
- Documentation issue: MTTR < 10 minutes (self-service)
- Configuration help: MTTR < 2 hours (async)
- Platform bug: MTTR < 4 hours (urgent)
- Feature request: Tracked in backlog

Target: 80% of issues resolved in <1 hour

Dashboard Example

┌──────────────────────────────────────────────────────────────┐
│           FAWKES PLATFORM HEALTH DASHBOARD                   │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  ADOPTION                                                    │
│  ├─ Active Teams: 32/40 (80%) ▲ +3 this month              │
│  ├─ Daily Deployments: 127 ▲ +15% MoM                       │
│  └─ Feature Usage:                                           │
│      • CI/CD: 85% ▲                                          │
│      • GitOps: 70% ▲                                         │
│      • Observability: 55% ▲                                  │
│                                                              │
│  SATISFACTION (NPS: 42)                                      │
│  ├─ Promoters: 55% (22 users)                               │
│  ├─ Passives: 32% (13 users)                                │
│  └─ Detractors: 13% (5 users) ⚠️                            │
│                                                              │
│  DORA METRICS                                                │
│  ├─ Deploy Frequency: 5.2/day ▲ (Target: >5)               │
│  ├─ Lead Time: 45 min ▼ (Target: <1 hour)                  │
│  ├─ MTTR: 12 min ▲ (Target: <15 min)                       │
│  └─ Change Fail Rate: 3.2% ▲ (Target: <5%)                 │
│                                                              │
│  SUPPORT                                                     │
│  ├─ Open Tickets: 8 (3 urgent)                              │
│  ├─ MTTR: 2.3 hours ▼ (Target: <4 hours)                   │
│  └─ Top Issues:                                              │
│      1. Deployment timeouts (3 tickets)                     │
│      2. Secret management confusion (2 tickets)             │
│                                                              │
│  ACTIONS                                                     │
│  🔴 Investigate detractors (schedule 5 interviews)           │
│  🟡 Improve deployment timeout documentation                │
│  🟢 Celebrate: Hit 80% adoption milestone! 🎉               │
│                                                              │
└──────────────────────────────────────────────────────────────┘

🛠️ Hands-On Lab: Building a Platform Product

Lab Overview

You will practice product management for a platform by:

  1. Analyzing user feedback and identifying themes
  2. Prioritizing features using impact vs. effort
  3. Creating a quarterly roadmap
  4. Setting up NPS surveys and adoption tracking
  5. Building a platform health dashboard

Duration: 25 minutes Tools: Backstage, Fawkes CLI, survey tools


Lab Setup

# Start the platform product lab
fawkes lab start --module 17

# This provides:
# - Backstage instance with sample feedback
# - Analytics data from 40 development teams
# - Survey responses (30 developers)

Exercise 1: Analyze User Feedback (6 minutes)

Objective: Review user interviews and identify top 3 themes.

# View feedback data
cd ~/fawkes-lab-17/user-research
cat interviews.json | jq '.[] | {name, role, pain_points}'

Sample feedback (you'll see 15 interviews):

{
  "interviews": [
    {
      "name": "Alex (Frontend Dev)",
      "pain_points": ["Deployments take 25 minutes", "No way to see if my deploy worked", "Kubernetes is confusing"]
    },
    {
      "name": "Jordan (Backend Dev)",
      "pain_points": ["Can't debug production issues", "No staging environment", "Database migrations are manual"]
    }
    // ... 13 more interviews
  ]
}

Your task: Use the provided script to analyze themes:

# Run theme analysis
python analyze_feedback.py interviews.json

# Output:
# THEME 1: Slow deployments (12 mentions)
# THEME 2: Lack of visibility (15 mentions)
# THEME 3: No staging/preview envs (8 mentions)
# THEME 4: Difficult debugging (10 mentions)
# THEME 5: Steep learning curve (7 mentions)

Question: Which theme should you prioritize? Consider:

  • Frequency (how many users mentioned it)
  • Severity (how much pain does it cause)
  • Feasibility (can you solve it in <1 quarter)

Exercise 2: Prioritize Using Impact vs Effort (5 minutes)

Objective: Plot features on an impact/effort matrix.

# View proposed features
cat features.yaml

features.yaml:

features:
  - name: "Real-time deployment dashboard"
    impact: high
    effort: medium
    theme: "Lack of visibility"

  - name: "Self-service preview environments"
    impact: high
    effort: high
    theme: "No staging"

  - name: "Improve documentation"
    impact: medium
    effort: low
    theme: "Learning curve"

  - name: "Integrated log viewer"
    impact: medium
    effort: medium
    theme: "Debugging"

  - name: "Parallel CI pipelines"
    impact: high
    effort: medium
    theme: "Slow deployments"

Your task: Use the Fawkes CLI to generate a prioritization matrix:

fawkes product prioritize --input features.yaml --output priority-matrix.png

# Opens an image showing features plotted

Expected result:

DO FIRST (High Impact, Low-Med Effort):
1. Real-time deployment dashboard
2. Parallel CI pipelines
3. Improve documentation

DO NEXT (High Impact, High Effort):
4. Self-service preview environments

BACKLOG:
5. Integrated log viewer

Exercise 3: Create a Quarterly Roadmap (6 minutes)

Objective: Build a Q1 roadmap based on prioritized features.

# Use roadmap template
cp templates/roadmap-template.md Q1-2025-roadmap.md
vim Q1-2025-roadmap.md

Fill in the template:

# Q1 2025 Platform Roadmap

## North Star Goal

Reduce deployment time from 25 minutes to <8 minutes (68% improvement)

## Sprint 1-2 (Weeks 1-4): Visibility

**Theme**: Developers can't see what's happening

- [ ] Deployment status dashboard in Backstage

  - Real-time progress (queued → building → deploying → healthy)
  - Estimated time remaining
  - Success metric: 70% of developers check dashboard weekly

- [ ] Slack notifications
  - Deploy started/completed/failed
  - @ mention author on failures
  - Success metric: <2 min to notice failed deploy

## Sprint 3-4 (Weeks 5-8): Speed

**Theme**: Deployments are too slow

- [ ] Parallel CI pipelines

  - Run tests in parallel (8min → 3min)
  - Cache dependencies
  - Success metric: P95 build time <5 minutes

- [ ] Optimize Docker builds
  - Multi-stage builds
  - Layer caching
  - Success metric: Image build <2 minutes

## Sprint 5-6 (Weeks 9-12): Polish

**Theme**: Improve overall experience

- [ ] Documentation overhaul

  - Step-by-step tutorials for common tasks
  - Video walkthroughs
  - Success metric: NPS +10 points

- [ ] Weekly office hours
  - 1 hour/week for Q&A
  - Success metric: 15+ attendees average

## Success Criteria (End of Q1)

- [ ] Deploy time P95: <8 minutes (from 25min)
- [ ] Platform adoption: 85% of teams (from 70%)
- [ ] NPS: 50+ (from 38)
- [ ] Support tickets: <10/week (from 18/week)

Validate your roadmap:

fawkes product validate-roadmap Q1-2025-roadmap.md

# Checks:
# ✅ All items have success metrics
# ✅ North Star goal is measurable
# ✅ Timeboxed to one quarter
# ⚠️  Warning: Sprint 1-2 may be overloaded (2 major features)

Exercise 4: Set Up NPS Surveys (4 minutes)

Objective: Configure automated NPS surveys in Backstage.

# Install Backstage feedback plugin
cd ~/fawkes-lab-17/backstage
yarn add @backstage/plugin-user-feedback

Configure survey:

# app-config.yaml
userFeedback:
  surveys:
    - id: platform-nps
      title: "Platform Satisfaction Survey"
      frequency: quarterly
      questions:
        - id: nps
          type: nps
          text: "How likely are you to recommend Fawkes Platform to a colleague?"

        - id: reason
          type: text
          text: "What's the PRIMARY reason for your score?"

        - id: biggest-pain
          type: text
          text: "What's your biggest pain point with the platform?"

        - id: feature-satisfaction
          type: matrix
          text: "How satisfied are you with the following?"
          rows:
            - "Deployment speed"
            - "Documentation"
            - "Getting help when stuck"
            - "Observability/debugging"
          scale: 1-5

Test the survey:

# Simulate 10 survey responses
fawkes lab simulate-surveys --count 10

# View results dashboard
open http://localhost:3000/user-feedback/platform-nps

Expected dashboard:

Platform NPS: 42
├─ Promoters (9-10): 12 responses (55%)
├─ Passives (7-8): 7 responses (32%)
└─ Detractors (0-6): 3 responses (13%)

Top Pain Points:
1. "Deployments are still too slow" (8 mentions)
2. "Documentation is hard to find" (5 mentions)
3. "Don't know how to debug prod issues" (4 mentions)

Feature Satisfaction (1-5 scale):
├─ Deployment speed: 3.2/5 ⚠️
├─ Documentation: 3.8/5
├─ Getting help: 4.1/5 ✅
└─ Observability: 3.5/5

Exercise 5: Build Platform Health Dashboard (4 minutes)

Objective: Create a dashboard showing adoption, satisfaction, and DORA metrics.

# Use Grafana with pre-configured datasources
cd ~/fawkes-lab-17/grafana
docker-compose up -d

# Import dashboard
curl -X POST http://localhost:3000/api/dashboards/db \
  -H "Content-Type: application/json" \
  -d @platform-health-dashboard.json

Dashboard panels (pre-configured):

  1. Adoption Panel:
# Teams using platform
count(count by (team) (deployment_total{platform="fawkes"}))

# Daily deployments
rate(deployment_total{platform="fawkes"}[1d])
  1. Satisfaction Panel:
-- NPS score
SELECT
  (COUNT(*) FILTER (WHERE score >= 9) * 100.0 / COUNT(*) -
   COUNT(*) FILTER (WHERE score <= 6) * 100.0 / COUNT(*)) as nps
FROM survey_responses
WHERE survey_id = 'platform-nps'
  AND created_at > NOW() - INTERVAL '90 days'
  1. DORA Metrics Panel:
# Deploy frequency (per day per team)
avg(rate(deployment_total[1d])) by (team)

# Lead time for changes (commit to deploy)
histogram_quantile(0.95,
  rate(lead_time_seconds_bucket[1d])
)

# MTTR
histogram_quantile(0.95,
  rate(incident_resolution_seconds_bucket[1d])
)

# Change failure rate
(rate(deployment_failed_total[1d]) /
 rate(deployment_total[1d])) * 100

View the dashboard:

open http://localhost:3000/d/platform-health

# You should see a comprehensive dashboard showing all key metrics

Lab Validation

# Run validation
fawkes lab validate --module 17

# Expected output:
# ✅ User feedback analyzed and themes identified
# ✅ Features prioritized using impact/effort matrix
# ✅ Quarterly roadmap created with success metrics
# ✅ NPS survey configured and tested
# ✅ Platform health dashboard deployed

Cleanup:

fawkes lab stop --module 17

✅ Knowledge Check

Question 1: Product Mindset

What's the key difference between "platform as infrastructure" vs "platform as a product"?

A) Products are externally sold, infrastructure is internal B) Products focus on user satisfaction, infrastructure focuses on uptime C) Products cost more to build D) Infrastructure is more reliable

Show Answer **Answer: B** Platform as a product treats internal developers as customers and measures success by their satisfaction and outcomes, not just technical availability. You can have 99.99% uptime but zero adoption if developers don't find it valuable.

Question 2: User Research

Which user research method provides the deepest insights into developer pain points?

A) Anonymous surveys B) Usage analytics C) In-person interviews with observation D) Support ticket analysis

Show Answer **Answer: C** One-on-one interviews combined with observing actual workflows reveal not just what people say, but what they actually do. You discover workarounds, inefficiencies, and unspoken needs that surveys miss.

Question 3: User Personas

Why create user personas for your platform?

A) To segment users for marketing B) Different roles have different needs requiring tailored solutions C) It's a requirement for product management D) To decide which users to prioritize

Show Answer **Answer: B** Frontend developers, backend engineers, data scientists, and SREs have vastly different needs. A one-size-fits-all platform satisfies no one. Personas help you design appropriate experiences for each group.

Question 4: NPS (Net Promoter Score)

Your platform has an NPS of -15. What does this mean?

A) 15% of users are happy B) More detractors than promoters - urgent action needed C) Average satisfaction is 15% D) Normal score for internal platforms

Show Answer **Answer: B** NPS = % Promoters - % Detractors. A negative NPS means you have more unhappy users than happy ones. This indicates serious problems requiring immediate investigation and action.

Question 5: Roadmap Prioritization

You have two features: "Real-time logs" (high impact, high effort) and "Improved docs" (medium impact, low effort). Which should you build first?

A) Real-time logs (higher impact) B) Improved docs (faster to ship) C) Build both simultaneously D) Survey users to decide

Show Answer **Answer: B** Start with "quick wins" (medium impact, low effort) to build momentum and trust. Better docs can ship in weeks and immediately help users. Real-time logs takes months and users may not trust you to deliver if you haven't shipped smaller improvements first.

Question 6: Adoption Metrics

Your platform has 40% adoption after 6 months. What should you do?

A) Mandate usage via policy B) Interview non-adopters to understand barriers C) Add more features to attract users D) Wait longer for organic adoption

Show Answer **Answer: B** Low adoption indicates your platform doesn't meet user needs. Talk to the 60% who aren't using it - they'll tell you exactly what's blocking them. Mandating usage creates resentment, and adding features may worsen the problem if they're not addressing real needs.

Question 7: Success Metrics

Which metric best indicates your platform is succeeding?

A) Number of features shipped B) Infrastructure uptime percentage C) Improvement in DORA metrics for users D) Size of your platform team

Show Answer **Answer: C** The ultimate measure of platform success is whether it improves outcomes for your users. If teams using your platform deploy more frequently with fewer failures (better DORA metrics), you're succeeding regardless of feature count or uptime.

Question 8: Saying No

A senior engineer requests a niche feature that would take 2 months but only helps their team. How do you respond?

A) "No, we're too busy" B) "File a ticket and we'll get to it eventually" C) Build it (they're senior so must be important) D) Explain current priorities and understand the underlying need

Show Answer **Answer: D** Productively saying no means: (1) Acknowledge the request, (2) Explain current priorities and why, (3) Understand the underlying need (there may be a simpler solution), (4) Offer alternatives or a timeline for reconsideration. Never just say no.

🌍 Real-World Examples

Example 1: Spotify's Backstage - Dogfooding as Product Strategy

Challenge: 280+ engineers, hundreds of microservices, fragmented tooling creating chaos.

Product Approach:

  • Started with research: Interviewed 50 engineers about pain points
  • Built for themselves first: Backstage solved Spotify's own problems
  • Measured everything: Tracked time-to-deploy, incident response time, onboarding speed
  • Iterated based on feedback: Weekly demos, monthly retrospectives

Key Product Decisions:

  • Golden paths, not enforcement: Made easy path obvious, didn't block alternatives
  • Self-service: Developers create services without ops tickets
  • Plugin ecosystem: Teams can extend for their needs

Results:

  • Onboarding time: 10 days → 1 day (90% improvement)
  • Time to first deploy: 4 hours → 5 minutes (98% improvement)
  • Adoption: 100% of teams (voluntary, not mandated)

Lesson: "If we couldn't convince ourselves to use it, we knew developers wouldn't either."

Learn more: Backstage Engineering Blog


Example 2: Netflix's Paved Road - Product Thinking at Scale

Philosophy: "We don't require you to use the paved road, but we make it so good that you'd be crazy not to."

Product Strategy:

UNPAVED ROAD (Hard way):
├─ Provision infrastructure yourself: 2 days
├─ Configure monitoring: 4 hours
├─ Set up CI/CD: 1 day
├─ Security scanning: 3 hours
└─ Total: 3+ days + ongoing maintenance

PAVED ROAD (Netflix platform):
├─ Run: netflix-scaffold new-service
├─ Infrastructure auto-provisioned: 10 minutes
├─ Monitoring pre-configured: 0 minutes
├─ CI/CD ready: 0 minutes
├─ Security included: 0 minutes
└─ Total: 10 minutes + zero maintenance

Key Insight: Don't mandate the platform, make it irresistibly better.

Product Metrics:

  • Adoption: 95%+ voluntary (not mandated)
  • Developer satisfaction: NPS 72 (world-class)
  • Time saved: 40+ engineering hours per new service

How they measured product-market fit:

  • Tracked adoption rate by team
  • Monthly surveys (NPS + open feedback)
  • Usage analytics (which features, how often)
  • "Paved road health score" dashboard

Lesson: Product thinking means your platform wins by being better, not by being required.


Example 3: Etsy's Product Management for Infrastructure

Challenge: Platform team seen as "cost center" with unclear value.

Product Transformation:

Before (Infrastructure mindset):

  • Success = Uptime percentage
  • Shipped features, hoped developers used them
  • No user research
  • Reactive support (waiting for tickets)

After (Product mindset):

  • Hired product manager for platform team
  • Quarterly OKRs tied to developer productivity
  • Regular user research: 10 developer interviews/month
  • Platform health dashboard: Adoption, satisfaction, DORA metrics
  • Proactive support: Office hours, documentation, onboarding

Product Management Practices:

  1. Quarterly Planning:
Q1 OKRs:
Objective: Make deployments delightful
├─ KR1: Deploy time P95 < 10 minutes (from 30min)
├─ KR2: Deployment success rate > 95% (from 88%)
└─ KR3: Developer NPS > 40 (from 18)
  1. Bi-weekly User Testing:

  2. Watch developers deploy a feature

  3. Identify friction points
  4. Ship improvements within 1 sprint

  5. Feature Flags for Platform Features:

# Gradually roll out new features
features:
  parallel_ci:
    enabled_teams: ["payments", "search", "checkout"]
    rollout_percentage: 25%
    feedback_required: true

Results:

  • NPS: 18 → 58 in 6 months
  • Adoption: 45% → 85%
  • Platform budget: Increased 40% (demonstrated clear value)
  • Team morale: Platform team seen as strategic, not cost center

Lesson: Treating infrastructure as a product transforms how the organization views and funds platform teams.


Example 4: Airbnb's Platform Product Management

Structure: Each platform capability has a dedicated product manager.

Example: CI/CD Product Manager

Responsibilities:

  • User research: Interview 5 developers weekly
  • Roadmap: Prioritize features based on impact
  • Metrics: Own deployment frequency and lead time
  • Communication: Publish monthly updates to eng org

Sample Project: "Project Lightning" (Faster CI/CD)

Discovery Phase (2 weeks):

Research findings:
- 78% of developers frustrated with CI speed
- Average build time: 18 minutes
- 40% of builds fail due to flaky tests
- Developers context-switch while waiting

User quotes:
"I start a build then go get coffee. By the time I'm back,
 I've forgotten what I was working on."

"Half the time the build fails because of a flaky test,
 not my code. It's demoralizing."

Roadmap (3 months):

Month 1: Quick wins
├─ Parallel test execution: 18min → 12min
├─ Better test splitting
└─ Success metric: 33% faster builds

Month 2: Reliability
├─ Quarantine flaky tests
├─ Auto-retry failed tests once
└─ Success metric: <10% false failures

Month 3: Intelligence
├─ Predictive test selection (only run affected tests)
├─ Smart caching
└─ Success metric: 12min → 5min average build time

Communication:

  • Weekly Slack updates in #engineering
  • Demo videos showing improvements
  • "Build time tracker" dashboard (public)

Results:

  • Build time: 18min → 5min (72% faster)
  • False failure rate: 40% → 8%
  • Developer NPS: +28 points
  • 2,000+ engineering hours saved/month

Lesson: Dedicated product management for platform capabilities drives meaningful improvements. Treat each platform area (CI/CD, observability, deployment) as its own product.


📊 DORA Capabilities Mapping

This module directly supports these DORA capabilities:

Capability How This Module Helps Impact on Metrics
Generative Culture Product thinking fosters collaboration between platform and dev teams Improves all DORA metrics through better alignment
Visual Management Platform health dashboards make work visible Faster identification of bottlenecks
Team Experimentation User research and feedback loops enable rapid iteration Higher deployment frequency through faster learning
Work in Small Batches Quarterly roadmaps and iterative improvement Reduced lead time and change failure rate
Learning Culture Continuous user feedback creates learning organization Sustained improvement across all metrics

🔧 Troubleshooting Common Issues

Issue 1: Low Adoption Despite Good Technology

Symptom: You've built a technically excellent platform but only 30% adoption after 6 months.

Root Causes:

  • Built for perceived needs, not actual needs
  • No marketing/evangelism of the platform
  • Lack of documentation or examples
  • Migration path too difficult from existing solutions

Solution:

STEP 1: Interview non-adopters
"Why aren't you using the platform?"
Common answers:
- "Didn't know it existed"
- "Too hard to migrate"
- "My current solution works fine"
- "Tried it once, got stuck, gave up"

STEP 2: Address barriers systematically
├─ Awareness: Weekly demos, Slack announcements, onboarding talks
├─ Migration: Build automated migration tools
├─ Documentation: Step-by-step tutorials for common use cases
└─ Support: Dedicated office hours, Slack channel with fast response

STEP 3: Create champions
├─ Find early adopters who love the platform
├─ Have them present at team meetings
├─ "Platform champions" program with incentives
└─ Share success stories publicly

Issue 2: Negative NPS (More Detractors Than Promoters)

Symptom: NPS of -10 or below, lots of complaints.

Immediate Actions:

WEEK 1: Understand the damage
├─ Read every detractor comment
├─ Schedule calls with 10 most vocal detractors
├─ Identify the top 3 pain points

WEEK 2: Quick wins
├─ Fix documentation gaps (lowest effort)
├─ Improve most common error messages
├─ Send personal follow-ups to detractors

MONTH 1: Address systemic issues
├─ Tackle #1 pain point from research
├─ Communicate progress transparently
├─ Re-survey after changes ship

ONGOING: Prevent future issues
├─ Monthly NPS surveys (catch problems early)
├─ Faster response to support tickets
├─ Proactive communication about known issues

Example Turnaround:

GitHub Internal Platform (fictional example):
- Month 0: NPS -15 (crisis mode)
  - Top issue: Deployments failing randomly
  - Action: All-hands to fix reliability

- Month 1: NPS -5 (improving)
  - Fixed deployment reliability
  - Added status page for transparency

- Month 3: NPS +15 (positive)
  - Continued improvements
  - Regular communication building trust

- Month 6: NPS +42 (healthy)
  - Platform now trusted
  - Adoption increasing

Issue 3: Feature Requests Overwhelming Your Backlog

Symptom: 200+ feature requests, can't prioritize, team paralyzed.

Solution - Ruthless Prioritization:

FRAMEWORK: Impact vs Strategic Alignment

Step 1: Categorize all requests
├─ P0 (Do Now): High impact + Strategic alignment
│   Example: Deploy speed improvements (affects all teams)
│
├─ P1 (Do Soon): High impact OR Strategic alignment
│   Example: Preview environments (affects 60% of teams)
│
├─ P2 (Do Later): Medium impact + Nice to have
│   Example: Additional language support
│
└─ P3 (Don't Do): Low impact + Off-strategy
    Example: Custom CI runners for 1 team

Step 2: Communicate decisions
├─ Publish prioritization criteria
├─ Explain "why" for each category
├─ Set expectations (P0 this quarter, P1 next quarter, P2 backlog, P3 rejected)

Step 3: Review quarterly
├─ Re-prioritize based on new data
├─ Business priorities may change
└─ Some P2s become P0s (and vice versa)

Sample Communication:

# Platform Roadmap Prioritization

## How We Prioritize

**P0 Criteria** (Do This Quarter):

- Affects >50% of teams
- Improves DORA metrics significantly
- Blocks other high-priority work

**Current P0 Features** (Q1 2025):

1. Deployment speed improvements (18min → 8min target)
2. Real-time deployment status dashboard
3. Parallel CI pipelines

**P1 Features** (Q2 2025): 4. Self-service preview environments 5. Integrated log viewer 6. Cost optimization dashboard

## Your Request: "Support for Terraform 1.7"

- Priority: P2 (Do Later)
- Reasoning: Affects 5 teams (12%), existing 1.6 sufficient for most use cases
- Timeline: Q3 2025 (will revisit if Terraform 1.7 becomes critical)

Questions? Disagree with priority? Let's talk: #platform-feedback

Issue 4: Platform Team Seen as Cost Center, Not Value Driver

Symptom: Budget cuts, no headcount, leadership doesn't understand platform value.

Solution - Quantify Business Impact:

BUILD A BUSINESS CASE

1. Quantify Time Savings:
   Before platform: 40 teams × 4 hours/deploy × $150/hour = $24,000/deploy
   After platform: 40 teams × 0.5 hours/deploy × $150/hour = $3,000/deploy
   Savings per deploy: $21,000
   Deploys per day: 50
   Annual savings: $21,000 × 50 × 250 days = $262.5M

2. Quantify Faster Time-to-Market:
   Lead time improvement: 2 weeks → 2 days
   Revenue impact: Ship features 10 days faster
   If feature generates $100k/month revenue:
   Value: $100k × (10/30) = $33k per feature
   Features per year: 100
   Annual value: $3.3M

3. Quantify Risk Reduction:
   MTTR improvement: 2 hours → 15 minutes
   Downtime cost: $50k/hour
   Incidents per month: 5
   Annual savings: $50k × 1.75 × 5 × 12 = $5.25M

TOTAL ANNUAL VALUE: $271M
Platform team cost: $5M/year (10 engineers)
ROI: 54x

Present to Leadership:

# Platform Team Business Case

## Executive Summary

Our platform team drives $271M in annual value through:

- $262.5M in developer productivity gains
- $3.3M in faster time-to-market
- $5.25M in reduced downtime costs

At $5M/year cost, we deliver 54x ROI.

## Metrics

- Deploy frequency: 5/day (up from 0.5/day)
- Lead time: 2 days (down from 14 days)
- MTTR: 15 minutes (down from 2 hours)
- Developer NPS: 52 (up from 18)

## Request

Maintain current headcount (10 FTE) and approve Q1 roadmap.
Without platform investment, we risk losing competitive advantage
in deployment velocity.

📚 Additional Resources

Books

  • "The Lean Startup" by Eric Ries - Core product principles applicable to platforms
  • "Inspired: How to Create Tech Products Customers Love" by Marty Cagan - Product management fundamentals
  • "Escaping the Build Trap" by Melissa Perri - Outcome-driven product development
  • "User Story Mapping" by Jeff Patton - Understanding user journeys
  • "The Mom Test" by Rob Fitzpatrick - How to conduct effective user interviews

Articles & Papers

  • "Team Topologies" by Matthew Skelton & Manuel Pais - Platform team structures
  • "Platform Strategy" by Evan Bottcher (ThoughtWorks) - Defining platform vision
  • "Developers Are Users Too" by Jean Yang - Applying UX to developer tools
  • DORA State of DevOps Reports - Measuring platform impact

Courses & Communities

  • Platform Engineering Community - platformengineering.org
  • Product School - Internal product management courses
  • Mind the Product - Product management community and resources

Tools

  • Backstage - Platform with built-in user feedback and analytics
  • Pendo - Product analytics for internal tools
  • Dovetail - User research repository
  • ProductBoard - Roadmap management
  • Fullstory - Session replay for internal tools

🎯 Key Takeaways

By completing this module, you've learned:

  1. Platform as a product mindset - Your users are developers; measure their satisfaction
  2. User research methods - Interviews, surveys, shadowing, analytics
  3. Prioritization frameworks - Impact vs. effort, strategic alignment
  4. Roadmap building - Outcome-driven, timeboxed, measurable
  5. Key metrics - NPS, adoption, DORA metrics, support efficiency
  6. Product management practices - Feedback loops, iteration, communication

Critical insight: The best platform is useless if developers don't adopt it. Product thinking ensures you build what users actually need, not what you think they need.

Remember:

  • 🎯 Outcomes over outputs: Measure impact, not features shipped
  • 👂 Listen more than talk: Users know their problems better than you
  • 🔁 Iterate relentlessly: Small improvements compound over time
  • 📢 Communicate constantly: Share progress, celebrate wins, be transparent about challenges

🚀 Next Steps

In Module 18: Multi-Tenancy & Resource Management

You'll learn how to:

  • Design multi-tenant platforms serving multiple teams securely
  • Implement resource quotas and isolation
  • Handle namespace management and RBAC at scale
  • Create self-service onboarding workflows
  • Monitor and optimize resource utilization across tenants

Prepare by:

  • Identifying how many teams your platform will serve
  • Understanding your organization's compliance requirements
  • Listing resources that need quota enforcement (CPU, memory, storage)

🏆 Black Belt Progress

Module 17 Complete!

Black Belt Progress:
[████████░░░░░░░░░░░░] 25% (1/4 modules)

✅ Module 17: Platform as a Product
⬜ Module 18: Multi-Tenancy & Resource Management
⬜ Module 19: Security & Zero Trust
⬜ Module 20: Multi-Cloud Strategies

Next: Module 18 to continue your Black Belt journey!

Module 17: Platform as a Product | Fawkes Dojo | Black Belt "Build what users need, not what you think they need" | Version 1.0