AT-E1-006 Validation Test Coverage
This document maps the AT-E1-006 acceptance criteria to the validation tests and scripts.
Acceptance Criteria Coverage
Primary Validation Script: scripts/validate-at-e1-006.sh
This script performs comprehensive validation of the observability stack deployment.
| Acceptance Criterion | Validation Function | Test Name | Status |
|---|---|---|---|
| Prometheus Operator (kube-prometheus-stack) deployed | validate_prometheus_operator() |
prometheus_operator |
✅ Covered |
| Grafana deployed with pre-configured datasources | validate_grafana() |
grafana |
✅ Covered |
| ServiceMonitors for all platform components | validate_servicemonitors() |
servicemonitors |
✅ Covered |
| OpenTelemetry Collector deployed as DaemonSet | validate_servicemonitors() |
servicemonitors (otel-collector-metrics) |
✅ Covered |
| Grafana dashboards for Kubernetes cluster health | validate_grafana_ingress() |
grafana_ingress |
✅ Covered |
| Grafana dashboards for DORA metrics (4 key metrics) | BDD feature test | prometheus-stack-deployment.feature |
✅ Covered |
| Grafana dashboards for Platform component health | BDD feature test | prometheus-stack-deployment.feature |
✅ Covered |
| Grafana dashboards for Application metrics | BDD feature test | prometheus-stack-deployment.feature |
✅ Covered |
| Alerting rules configured and firing test alerts | validate_alertmanager() |
alertmanager |
✅ Covered |
| Log retention policy: 30 days | validate_prometheus_storage() |
prometheus_storage |
✅ Covered |
| Metrics retention: 90 days | validate_prometheus_storage() |
prometheus_storage |
✅ Covered |
| Dashboard load time <2 seconds | tests/integration/grafana-dashboards.sh |
test_dashboard_load_time() |
✅ Covered |
Additional Components Validated
| Component | Validation Function | Test Name | Purpose |
|---|---|---|---|
| Monitoring namespace | validate_namespace() |
namespace_exists |
Ensure namespace exists and is active |
| ArgoCD Application | validate_argocd_application() |
argocd_application |
Ensure GitOps deployment is healthy |
| Prometheus Server | validate_prometheus_server() |
prometheus_server |
Validate StatefulSet is running |
| Node Exporter | validate_node_exporter() |
node_exporter |
DaemonSet on all nodes |
| Kube State Metrics | validate_kube_state_metrics() |
kube_state_metrics |
Cluster metrics collection |
| Prometheus Ingress | validate_prometheus_ingress() |
prometheus_ingress |
External access configured |
| Resource Limits | validate_resource_limits() |
resource_limits |
All components have resource constraints |
| Pod Health | validate_pods_health() |
pods_health |
Overall monitoring namespace health |
Integration Tests
pytest Integration Test: tests/integration/test_at_e1_006_validation.py
This pytest-based integration test runs the validation script and verifies all acceptance criteria are met.
Test Methods:
test_validation_script_runs()- Verifies script executes without crashingtest_validation_report_generated()- Ensures JSON report is createdtest_all_acceptance_criteria_pass()- Overall pass/fail validationtest_namespace_exists()- Monitoring namespace validationtest_argocd_application_healthy()- ArgoCD Application statustest_prometheus_operator_running()- Prometheus Operator healthtest_prometheus_server_running()- Prometheus Server healthtest_grafana_deployed()- Grafana deployment statustest_alertmanager_running()- Alertmanager healthtest_node_exporter_running()- Node Exporter on all nodestest_kube_state_metrics_running()- Kube State Metrics healthtest_servicemonitors_configured()- ServiceMonitor configurationtest_prometheus_storage_configured()- Persistent storagetest_grafana_ingress_configured()- Grafana ingresstest_prometheus_ingress_configured()- Prometheus ingresstest_resource_limits_configured()- Resource limits definedtest_pods_health()- Overall pod health
Usage:
# Run all tests
pytest tests/integration/test_at_e1_006_validation.py -v
# Run smoke tests only
pytest tests/integration/test_at_e1_006_validation.py -v -m smoke
# Run with custom namespace
pytest tests/integration/test_at_e1_006_validation.py -v --namespace monitoring
Automation Scripts
1. tests/integration/prometheus-metrics.sh
Validates Prometheus metrics collection via API queries.
Tests:
- Prometheus health endpoint responding
- Active scrape targets configured
upmetric availability- DORA metrics collection (optional)
Usage:
./tests/integration/prometheus-metrics.sh
./tests/integration/prometheus-metrics.sh http://prometheus.custom.domain
2. tests/integration/grafana-dashboards.sh
Validates Grafana dashboards via API.
Tests:
- Grafana health endpoint responding
- Organization configured
- Datasources configured (Prometheus)
- Dashboards imported (Kubernetes, DORA)
- Dashboard load time (<2 seconds)
Usage:
./tests/integration/grafana-dashboards.sh
./tests/integration/grafana-dashboards.sh http://grafana.custom.domain
BDD Feature Tests
tests/bdd/features/prometheus-stack-deployment.feature
Comprehensive BDD tests covering all observability stack scenarios.
Key Scenarios:
- Prometheus deployed in monitoring namespace
- Prometheus ArgoCD Application exists and is healthy
- Prometheus Operator pods are running
- Prometheus is scraping metrics
- Prometheus persistent storage configured
- Grafana UI accessible via ingress
- Grafana admin login works
- Prometheus datasource configured in Grafana
- Default Kubernetes dashboards imported
- Alertmanager UI accessible
- Alertmanager configuration valid
- ServiceMonitors for platform components (ArgoCD, Jenkins, PostgreSQL, OpenTelemetry)
- Node Exporter DaemonSet running
- Kube State Metrics collecting data
- Prometheus API functional
- Alert rules loaded
- Platform components monitored
- Resource limits configured
- Alertmanager persistent storage
- Remote write for OpenTelemetry
Usage:
behave tests/bdd/features/prometheus-stack-deployment.feature
tests/bdd/features/opentelemetry-deployment.feature
Tests OpenTelemetry Collector deployment as part of observability stack.
Key Scenarios:
- OpenTelemetry deployed in monitoring namespace
- OpenTelemetry ArgoCD Application healthy
- Collector deployed as DaemonSet
- OTLP receivers configured
- Prometheus exporter configured
- Metrics/logs/traces pipelines configured
- Health check endpoints accessible
- Kubernetes attributes enrichment
- Self-metrics exposed
- Resource limits configured
- Security context configured
Usage:
behave tests/bdd/features/opentelemetry-deployment.feature
tests/bdd/features/centralized-logging.feature
Tests centralized logging with OpenTelemetry (part of observability).
Key Scenarios:
- Log forwarding to OpenSearch
- Kubernetes context enrichment
- Trace correlation
- Log searchability within SLA
- Failure handling
- Structured logging
- Multi-tenancy isolation
Usage:
behave tests/bdd/features/centralized-logging.feature
Validation Commands
The issue specifies these validation commands, which are all covered:
# Prometheus API query - covered by prometheus-metrics.sh
curl -f http://prometheus.local/api/v1/query?query=up
# Grafana health check - covered by grafana-dashboards.sh
curl -f http://grafana.local/api/health
# Prometheus rules validation - covered by validate_alertmanager() in validation script
promtool check rules platform/apps/prometheus/rules/*.yaml
Running All Validations
Quick Start
# Run primary validation (recommended)
make validate-at-e1-006
# Or run directly
./scripts/validate-at-e1-006.sh --namespace monitoring --argocd-namespace fawkes
Comprehensive Validation
# 1. Run primary validation script
./scripts/validate-at-e1-006.sh --namespace monitoring --argocd-namespace fawkes --verbose
# 2. Run pytest integration tests
pytest tests/integration/test_at_e1_006_validation.py -v
# 3. Run automation scripts
./tests/integration/prometheus-metrics.sh http://prometheus.127.0.0.1.nip.io
./tests/integration/grafana-dashboards.sh http://grafana.127.0.0.1.nip.io
# 4. Run BDD feature tests
behave tests/bdd/features/prometheus-stack-deployment.feature
behave tests/bdd/features/opentelemetry-deployment.feature
behave tests/bdd/features/centralized-logging.feature
CI/CD Integration
Example GitHub Actions workflow:
name: AT-E1-006 Validation
on:
push:
branches: [main]
pull_request:
jobs:
validate-observability:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Install dependencies
run: pip install -r requirements-dev.txt
- name: Run AT-E1-006 validation
run: make validate-at-e1-006
env:
NAMESPACE: monitoring
ARGO_NAMESPACE: fawkes
- name: Run integration tests
run: pytest tests/integration/test_at_e1_006_validation.py -v
- name: Upload test report
if: always()
uses: actions/upload-artifact@v3
with:
name: at-e1-006-report
path: reports/at-e1-006-validation-*.json
Report Format
All validations generate JSON reports with this structure:
{
"test_suite": "AT-E1-006: Observability Stack Validation",
"timestamp": "2024-12-15T15:45:00Z",
"namespace": "monitoring",
"argocd_namespace": "fawkes",
"summary": {
"total_tests": 14,
"passed": 14,
"failed": 0,
"pass_percentage": 100
},
"results": [
{
"test": "namespace_exists",
"status": "PASS",
"message": "Namespace monitoring exists and is Active"
},
...
]
}
Success Criteria
AT-E1-006 is considered PASSED when:
- ✅ Primary validation script exits with code 0
- ✅ All 14 validation tests pass (100% pass rate)
- ✅ JSON report generated in
reports/directory - ✅ Integration tests pass (pytest)
- ✅ Automation scripts pass (prometheus-metrics.sh, grafana-dashboards.sh)
- ✅ BDD feature tests pass (optional but recommended)
Dependencies
This test depends on the following issues being completed:
- paruff/fawkes#24 - Infrastructure setup
- paruff/fawkes#25 - GitOps/ArgoCD deployment
- paruff/fawkes#26 - Prometheus Operator deployment
- paruff/fawkes#27 - Grafana configuration
All dependencies must be satisfied before AT-E1-006 validation can pass.
Troubleshooting
See tests/integration/README.md for detailed troubleshooting steps for:
- Validation script failures
- Missing namespaces
- ArgoCD Application issues
- Pod readiness problems
- Prometheus/Grafana connectivity issues