📄 80 — Error Budgeting
Definition
Allocating a tolerable amount of failure or error within a system to balance reliability and innovation.
When to Use
• Engineering • SRE (Site Reliability Engineering) • Operations • Product development
How It Improves Reasoning
It prevents over‑engineering and encourages innovation while maintaining reliability.
Steps
- Define acceptable error rate.
- Allocate error budget.
- Monitor usage.
- Slow changes when budget is nearly consumed.
Example
A service with 99.9% uptime allows 0.1% downtime for experimentation.
Prompts
• “Define an error budget for this system.” • “How should changes be paced based on error budget usage?”