ADR-017: Kyverno Policy Engine for Policy-as-Code
Status
Accepted
Context
The Fawkes platform requires a policy enforcement mechanism to ensure:
- Security Standards: All workloads comply with Pod Security Standards
- Platform Standardization: Consistent labels, annotations, and configurations
- Governance: Resource quotas, network policies, and compliance requirements
- Runtime Enforcement: Policies enforced at admission time, not just CI
Current State
Security and standardization are enforced primarily by CI pipeline checks (pre-deployment). Post-deployment compliance relies on manual cluster administrator oversight, leading to:
- Inconsistent Enforcement: Different teams may bypass CI checks
- No Runtime Protection: Malicious or accidental misconfigurations can be applied directly via kubectl
- Manual Overhead: Administrators must manually review and correct non-compliant resources
- No Automatic Standardization: Platform labels and configurations require manual addition
Requirements from Issue
- Deploy Kyverno as the Kubernetes-native policy engine
- Implement validation policies for security (Pod Security Standards)
- Implement mutation policies for automatic standardization
- Implement generation policies for namespace defaults
- Enable policy reporting for audit and compliance
- Avoid duplication with CI security scans (SonarQube)
Decision
We will deploy Kyverno as the policy-as-code engine for the Fawkes platform, providing validation, mutation, and generation capabilities through native Kubernetes admission control.
Why Kyverno over OPA/Gatekeeper
| Criteria | Kyverno | OPA/Gatekeeper |
|---|---|---|
| Learning Curve | YAML-based, familiar | Rego language, steep |
| Mutation Support | Native | Limited |
| Generation Support | Native | Not supported |
| Kubernetes Native | Yes, CRDs | Abstraction layer |
| Policy Testing | kyverno CLI | opa test |
| Community | Growing rapidly | Established |
We chose Kyverno because:
- Lower barrier to entry: Platform teams can write policies in YAML
- Mutation capabilities: Critical for automatic Vault integration
- Generation capabilities: Essential for namespace standardization
- Kubernetes-native: Better integration with ArgoCD and GitOps workflows
Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ Kyverno Policy Engine │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ Admission Controllers │ │
│ │ │ │
│ │ ValidatingWebhook ──► Validate ──► Allow/Deny │ │
│ │ MutatingWebhook ──► Mutate ──► Modified Resource │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ Background Controllers │ │
│ │ │ │
│ │ GenerateController ──► Watch Resources ──► Create Generated │ │
│ │ ReportsController ──► Collect Results ──► PolicyReport │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────┬────────────────────────────────────────────┐ │
│ │ ClusterPolicy │ Policy (Namespace-scoped) │ │
│ │ • Security │ • Team-specific │ │
│ │ • Platform standards │ • Application exceptions │ │
│ └─────────────────────────┴────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
Policy Categories
1. Mandatory Security Policies (Enforce Mode)
These policies DENY non-compliant resources:
| Policy | Description | Enforcement |
|---|---|---|
require-run-as-non-root |
Containers must run as non-root | Enforce |
disallow-privileged-containers |
No privileged containers | Enforce |
restrict-host-namespaces |
No hostNetwork/hostPID/hostIPC | Enforce |
disallow-host-ports |
No host port bindings | Enforce |
disallow-capabilities |
Must drop ALL capabilities | Enforce |
require-resource-limits |
CPU/memory limits required | Enforce |
2. Standardization Policies (Mutate Mode)
These policies automatically modify resources:
| Policy | Mutation | Purpose |
|---|---|---|
add-platform-labels |
Add app.fawkes.idp/* labels |
Consistent labeling |
add-vault-annotations |
Add Vault Agent annotations | Secret injection |
set-ingress-class |
Set ingressClassName: nginx |
Traffic routing |
set-default-security-context |
Add secure defaults | Security baseline |
add-default-resources |
Add default requests | Scheduling |
3. Generation Policies (Generate Mode)
These policies create resources automatically:
| Policy | Generated Resources | Trigger |
|---|---|---|
generate-namespace-network-policy |
NetworkPolicy | New Namespace |
generate-namespace-resource-quota |
ResourceQuota | New Namespace |
generate-namespace-limit-range |
LimitRange | New Namespace |
generate-namespace-service-account |
ServiceAccount | New Namespace |
Deployment Configuration
High Availability
- Admission Controller: 3 replicas with pod anti-affinity
- Background Controller: 2 replicas
- Reports Controller: 1 replica
- Cleanup Controller: 1 replica
Excluded Namespaces
System namespaces are excluded from policy enforcement:
kube-systemkube-publickube-node-leasekyverno
CI/CD Integration (Avoiding Duplication)
Kyverno complements rather than duplicates CI security scanning:
| Layer | Tool | Purpose |
|---|---|---|
| Source Code | SonarQube | SAST, code quality, security hotspots |
| Dependencies | OWASP Check | Known vulnerabilities in libraries |
| Container Image | Trivy | Image vulnerabilities, SBOM |
| Admission | Kyverno | Runtime policy enforcement |
SonarQube detects code-level issues; Kyverno enforces deployment configuration. There is no overlap.
Consequences
Positive
- Consistent Enforcement: All resources validated at admission time
- Automatic Standardization: Platform labels and configurations applied automatically
- Self-Service Namespaces: Standard resources generated for new namespaces
- Audit Trail: PolicyReports provide compliance evidence
- GitOps Compatible: Policies stored in Git, deployed via ArgoCD
- Low Learning Curve: YAML-based policies familiar to Kubernetes users
Negative
- Webhook Overhead: Slight latency added to API server requests
- Policy Complexity: Complex policies may be hard to debug
- False Positives: Overly strict policies may block legitimate workloads
- Operational Burden: Kyverno cluster requires monitoring and maintenance
Risks and Mitigations
| Risk | Mitigation |
|---|---|
| Webhook unavailable blocks deployments | failurePolicy: Ignore for non-critical |
| Policy blocks legitimate workload | Start with Audit mode, transition to Enforce |
| Complex policies hard to maintain | Policy testing in CI, clear documentation |
| Performance impact | Adequate resources, caching, excluded namespaces |
Alternatives Considered
1. OPA/Gatekeeper
Rejected because: Steeper learning curve (Rego), no native mutation or generation support, less Kubernetes-native feel.
2. Pod Security Admission (PSA)
Rejected because: Only provides predefined security levels (baseline, restricted), no custom policies, no mutation or generation.
3. Custom Admission Webhooks
Rejected because: High development and maintenance burden, requires custom code for each policy type.
4. CI-Only Enforcement
Rejected because: Can be bypassed, no runtime protection, no automatic standardization.
Implementation Plan
Phase 1: Core Deployment (Week 1)
- [x] Deploy Kyverno via ArgoCD Application
- [x] Create mandatory security policies
- [x] Create mutation policies for standardization
- [x] Create generation policies for namespaces
Phase 2: Testing & Validation (Week 2)
- [ ] Test all policies in development environment
- [ ] Create BDD acceptance tests
- [ ] Document policy exceptions process
- [ ] Set up Grafana dashboard for metrics
Phase 3: Production Rollout (Week 3)
- [ ] Enable Audit mode in production
- [ ] Review PolicyReports for false positives
- [ ] Transition to Enforce mode
- [ ] Developer documentation and training