Azure AKS Setup Validation Checklist (AT-E1-001)
This checklist validates that the Azure AKS setup meets all acceptance criteria for issue #1.
Pre-Deployment Validation
Prerequisites Check
- [ ] Azure CLI installed (
az --version) - [ ] Terraform installed (
terraform --version) - [ ] kubectl installed (
kubectl version --client) - [ ] Authenticated to Azure (
az account show) - [ ] Correct subscription selected
Configuration Validation
- [ ] Copied
terraform.tfvars.exampletoterraform.tfvars - [ ] Updated resource names to be globally unique (ACR, Key Vault, Storage Account)
- [ ] Reviewed and adjusted VM sizes for budget
- [ ] Reviewed network CIDR ranges (no conflicts)
- [ ] Set appropriate environment value (dev/stage/prod)
- [ ] Configured Key Vault security settings for environment
Terraform Validation
cd infra/azure
terraform init
terraform validate
terraform fmt -check
- [ ] Terraform init succeeds
- [ ] Terraform validate passes
- [ ] Terraform formatting is correct
Deployment Validation
Infrastructure Deployment
# Method 1: Using ignite.sh
./scripts/ignite.sh --provider azure --only-cluster dev
# Method 2: Direct Terraform
cd infra/azure
terraform plan -out=tfplan
terraform apply tfplan
- [ ] Deployment completes without errors (10-15 minutes)
- [ ] Resource Group created
- [ ] Virtual Network created with correct CIDR
- [ ] AKS cluster created
- [ ] System node pool created (2 nodes)
- [ ] User node pool created (2+ nodes with auto-scaling)
- [ ] Azure Container Registry created
- [ ] Azure Key Vault created
- [ ] Storage Account created
- [ ] Log Analytics workspace created
Cluster Access
az aks get-credentials \
--resource-group fawkes-rg \
--name fawkes-aks \
--overwrite-existing
kubectl cluster-info
kubectl get nodes
- [ ] kubectl credentials retrieved
- [ ] Cluster info displays correctly
- [ ] All nodes visible and in Ready state
- [ ] At least 4 nodes present (2 system + 2 user minimum)
Acceptance Criteria Validation (AT-E1-001)
1. AKS cluster deployed in Azure
az aks show \
--resource-group fawkes-rg \
--name fawkes-aks \
--output table
- [ ] Cluster exists
- [ ] Provisioning state is "Succeeded"
- [ ] Power state is "Running"
2. 3-5 nodes running and schedulable
kubectl get nodes
kubectl get nodes -o json | jq '.items | length'
- [ ] Total node count is 4+ (2 system + 2+ user)
- [ ] All nodes show STATUS=Ready
- [ ] Nodes are schedulable (not cordoned)
3. Azure CNI networking configured
az aks show \
--resource-group fawkes-rg \
--name fawkes-aks \
--query "networkProfile.networkPlugin" \
--output tsv
- [ ] Network plugin is "azure"
- [ ] Network policy is configured (azure or calico)
- [ ] Service CIDR is 10.1.0.0/16
4. System node pool and user node pool separated
kubectl get nodes --show-labels | grep nodepool-type
az aks nodepool list \
--resource-group fawkes-rg \
--cluster-name fawkes-aks \
--output table
- [ ] System pool exists with mode=System
- [ ] User pool exists with mode=User
- [ ] Nodes have appropriate labels (nodepool-type)
- [ ] System pool has 2 nodes (fixed)
- [ ] User pool has auto-scaling enabled
5. kubectl configured and working
kubectl get pods -A
kubectl get services -A
kubectl cluster-info
- [ ] Can list all pods
- [ ] Can list all services
- [ ] API server is reachable
- [ ] System pods are Running
6. Cluster metrics available via Azure Monitor
az aks show \
--resource-group fawkes-rg \
--name fawkes-aks \
--query "addonProfiles.omsagent.enabled" \
--output tsv
- [ ] OMS agent addon is enabled
- [ ] Log Analytics workspace exists
- [ ] Container insights collecting data
- [ ] Can view metrics in Azure Portal
7. Azure AD integration configured
az aks show \
--resource-group fawkes-rg \
--name fawkes-aks \
--query "aadProfile" \
--output json
- [ ] AAD profile exists
- [ ] Managed AAD is enabled
- [ ] Azure RBAC is enabled
- [ ] RBAC is enabled on cluster
8. Cluster passes AT-E1-001
Run all validation checks - this is the comprehensive validation.
Resource Integration Validation
Azure Container Registry Integration
az aks check-acr \
--resource-group fawkes-rg \
--name fawkes-aks \
--acr fawkesacr.azurecr.io
az role assignment list \
--scope $(az acr show -n fawkesacr --query id -o tsv) \
--output table
- [ ] ACR check passes
- [ ] AcrPull role assigned to AKS kubelet identity
- [ ] Can pull images from ACR
Key Vault Integration
az keyvault show --name fawkes-kv --output table
az keyvault show --name fawkes-kv \
--query "properties.enableSoftDelete" \
--output tsv
- [ ] Key Vault exists
- [ ] Soft delete is enabled
- [ ] Access policies configured for deployer
- [ ] Access policies configured for AKS
- [ ] Network ACLs configured appropriately
Storage Account
az storage account show \
--name fawkestfstate \
--resource-group fawkes-rg \
--output table
az storage container list \
--account-name fawkestfstate \
--output table
- [ ] Storage account exists
- [ ] Container "tfstate" exists
- [ ] Replication type is correct (LRS/GRS)
- [ ] Terraform state is stored there
Log Analytics
az monitor log-analytics workspace show \
--resource-group fawkes-rg \
--workspace-name fawkes-aks-logs \
--output table
- [ ] Log Analytics workspace exists
- [ ] Retention period is configured
- [ ] Linked to AKS cluster
- [ ] Receiving container logs
Compliance Testing
InSpec Tests
# Install InSpec and Azure plugin if needed
# curl https://omnitruck.chef.io/install.sh | sudo bash -s -- -P inspec
# inspec plugin install inspec-azure
cd /path/to/fawkes
inspec exec infra/azure/inspec/ \
-t azure:// \
--input resource_group=fawkes-rg \
--input cluster_name=fawkes-aks \
--reporter cli json:reports/aks-inspec.json
Critical controls that must pass:
- [ ] aks-cluster-exists: Cluster exists and running
- [ ] aks-node-count: Minimum 2+ nodes
- [ ] aks-node-pool-separation: System and user pools separated
- [ ] aks-azure-cni: Azure CNI configured
- [ ] aks-managed-identity: Managed identity enabled
- [ ] aks-rbac-enabled: RBAC enabled
- [ ] k8s-nodes-ready: All nodes Ready
- [ ] k8s-system-pods-running: System pods Running
BDD Tests
# From repository root
pytest tests/bdd/features/azure_aks_provisioning.feature -v
# Or run specific scenarios
pytest tests/bdd/features/azure_aks_provisioning.feature -k "AT-E1-001"
- [ ] All test scenarios pass
- [ ] No skipped tests (if authenticated)
Cost Validation
Cost Estimation
./scripts/azure-cost-estimate.sh
- [ ] Script completes successfully
- [ ] Cost breakdown displayed
- [ ] Total cost within budget expectations
- [ ] Optimization suggestions reviewed (if over budget)
Actual Cost Check
az consumption usage list \
--start-date $(date -d '1 day ago' +%Y-%m-%d) \
--end-date $(date +%Y-%m-%d) \
--output table
- [ ] Can view usage data
- [ ] Costs align with estimates
- [ ] Resource tags present for cost tracking
Operational Validation
Scaling Tests
# Scale user pool
az aks nodepool scale \
--resource-group fawkes-rg \
--cluster-name fawkes-aks \
--name user \
--node-count 3
kubectl get nodes
- [ ] User pool scales up successfully
- [ ] New nodes become Ready
- [ ] Auto-scaler works within min/max limits
Pod Scheduling
# Deploy a test workload
kubectl create deployment nginx --image=nginx --replicas=3
kubectl get pods -o wide
- [ ] Pods schedule across nodes
- [ ] Pods run successfully
- [ ] Can access pod logs
Network Connectivity
# Test pod-to-pod connectivity
kubectl run test-pod --image=busybox -it --rm -- /bin/sh
# In pod: nslookup kubernetes.default
# In pod: wget -O- kubernetes.default
- [ ] DNS resolution works
- [ ] Pod can reach Kubernetes API
- [ ] Network policy allows expected traffic
Monitoring
# Check metrics
kubectl top nodes
kubectl top pods -A
- [ ] Metrics server working
- [ ] Node metrics available
- [ ] Pod metrics available
Security Validation
RBAC
kubectl auth can-i list pods --as=system:anonymous
kubectl auth can-i list pods --as=system:serviceaccount:default:default
- [ ] Anonymous access properly restricted
- [ ] Service account permissions appropriate
- [ ] Azure RBAC enforced
Network Security
kubectl get networkpolicies -A
az network nsg list --resource-group MC_fawkes-rg_*
- [ ] Network policies configured (if using)
- [ ] NSGs created by AKS
- [ ] Only required ports open
Secrets Management
kubectl get secrets -A
az keyvault secret list --vault-name fawkes-kv
- [ ] No plaintext secrets in cluster
- [ ] Secrets stored in Key Vault
- [ ] CSI driver configured (optional)
Documentation Validation
Runbook
- [ ] Read through
docs/runbooks/azure-aks-setup.md - [ ] All commands work as documented
- [ ] Architecture diagram matches deployment
- [ ] Troubleshooting section helpful
Terraform Documentation
- [ ] All variables documented
- [ ] Outputs are useful
- [ ] Examples are clear
- [ ] Comments explain complex logic
Cleanup (Optional for Dev Environments)
If you need to tear down the environment:
# Option 1: Terraform destroy
cd infra/azure
terraform destroy
# Option 2: Delete resource group
az group delete --name fawkes-rg --yes --no-wait
WARNING: This will delete all resources. Ensure you have backups if needed.
Sign-Off
Validated by: ****_**** Date: ****_**** Environment: [ ] Dev [ ] Stage [ ] Prod All critical checks passed: [ ] Yes [ ] No
Notes:
References
- Issue: https://github.com/paruff/fawkes/issues/1
- PR: https://github.com/paruff/fawkes/pull/[NUMBER]
- Runbook: docs/runbooks/azure-aks-setup.md
- InSpec Tests: infra/azure/inspec/
- BDD Tests: tests/bdd/features/azure_aks_provisioning.feature