📄 80 — Error Budgeting

Definition

Allocating a tolerable amount of failure or error within a system to balance reliability and innovation.

When to Use

• Engineering • SRE (Site Reliability Engineering) • Operations • Product development

How It Improves Reasoning

It prevents over‑engineering and encourages innovation while maintaining reliability.

Steps

  1. Define acceptable error rate.
  2. Allocate error budget.
  3. Monitor usage.
  4. Slow changes when budget is nearly consumed.

Example

A service with 99.9% uptime allows 0.1% downtime for experimentation.

Prompts

• “Define an error budget for this system.” • “How should changes be paced based on error budget usage?”