GitOps Strategy: From Push to Pull
Context
In traditional CI/CD pipelines, your build system (like Jenkins) has credentials to your production Kubernetes cluster and pushes changes directly. This "push-based" model has been the norm for decades, but it carries significant security and operational risks in cloud-native environments.
GitOps inverts this model: instead of CI pushing to production, production pulls its desired state from Git. The cluster itself watches a Git repository and continuously reconciles reality with the declared state. This seemingly small shift has profound implications for security, auditability, and reliability.
Fawkes adopted GitOps as a core architectural principle because it directly supports the behaviors that enable elite DORA performance: high deployment frequency, low change failure rate, and fast recovery time.
The Problem: Why Push-Based CD Falls Short
Security: The Credential Problem
The Traditional Model:
graph LR
A[Jenkins] -->|Has K8s credentials| B[Production Cluster]
C[Developer] -->|Triggers Build| A
style A fill:#ff6b6b
In push-based CI/CD:
- Jenkins holds production credentials - If Jenkins is compromised, so is production
- Credentials must be distributed - Every build agent needs cluster access
- Blast radius is massive - A CI security breach means production breach
- Credential rotation is painful - Update all CI configs when rotating
Drift: Configuration Divergence
When humans have kubectl access and deadlines loom, they make "quick fixes" directly in the cluster:
# "Just for now" - Famous last words
kubectl scale deployment critical-app --replicas=10
kubectl set env deployment/api LOG_LEVEL=debug
The Result:
- Production state diverges from Git
- Git is no longer the source of truth
- Disaster recovery becomes guesswork: "What was the actual config?"
- Compliance audits fail: "Show us the change approval for that scaling event"
Auditability: The Black Box Deployment
Push-based deployments create an audit gap:
- Git shows code changes
- Jenkins shows build execution
- But the actual deployment? Often just a line in Jenkins logs
- Question: "Who changed the replica count from 3 to 5?"
- Answer: ¯\(ツ)/¯ "Could be anyone with kubectl access"
Rollback: The Manual Scramble
When a bad deployment hits production in a push model:
- Find the previous good version (search through Jenkins jobs)
- Trigger a rebuild (hope dependencies haven't changed)
- Push again (cross fingers)
- Repeat if it fails again
Time to Restore: 15-45 minutes (not elite)
The Solution: GitOps Pull Model
How ArgoCD Changes the Game
graph TB
A[Git Repository] -->|Single Source of Truth| B[ArgoCD]
B -->|Watches & Syncs| C[Production Cluster]
B -->|Watches & Syncs| D[Staging Cluster]
B -->|Watches & Syncs| E[Dev Cluster]
F[Developer] -->|Git Commit| A
G[Jenkins CI] -->|Builds Image, Updates Manifest| A
style A fill:#4CAF50
style B fill:#2196F3
Key Principles:
-
Git as Single Source of Truth
-
Entire desired state lives in Git (applications, configs, infrastructure)
- Want to know production state? Read Git
-
Want to change production? Open a PR
-
Declarative Configuration
-
Describe what you want, not how to get there
- ArgoCD figures out the
kubectl applydance -
Idempotent: Apply the same config 100 times = same result
-
Automated Sync
-
ArgoCD polls Git every 3 minutes (or use webhooks for instant sync)
- Detects drift and auto-corrects
-
No manual intervention needed
-
Self-Healing
- Someone does
kubectl editto "fix" something? - ArgoCD reverts it back to Git state
- Want it to stick? Commit to Git
The App-of-Apps Pattern
A single ArgoCD "Application" manages other Applications - like a meta-deployment:
# bootstrap/app-of-apps.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: platform-apps
namespace: argocd
spec:
source:
repoURL: https://github.com/paruff/fawkes
path: platform/apps
targetRevision: main
destination:
server: https://kubernetes.default.svc
namespace: argocd
What This Enables:
- Deploy the entire platform with one Application
- Add a new service? Just add a directory in
platform/apps/ - ArgoCD discovers and deploys it automatically
- Remove a service? Delete its directory and it's undeployed
The Bootstrap Process:
- Install ArgoCD itself (one-time manual step)
- Create the root App-of-Apps
- Everything else (Jenkins, Backstage, Prometheus) deploys automatically
- Never touch
kubectlagain (well, rarely)
Security Model Shift
Before (Push Model):
CI System → Stores credentials → Deploys to cluster
After (Pull Model):
CI System → Updates Git → ArgoCD (running IN cluster) syncs
Benefits:
- ✅ No credentials in CI - Jenkins doesn't need cluster access
- ✅ Credentials never leave cluster - ArgoCD uses in-cluster service account
- ✅ Reduced attack surface - Compromise CI ≠ Compromise Production
- ✅ Easier credential rotation - Only ArgoCD service account to manage
Trade-Offs: What You Gain and Lose
What GitOps Gives You
| Benefit | Impact | DORA Metric |
|---|---|---|
| Audit Trail | Complete Git history of who changed what, when, why | All metrics (compliance reduces incidents) |
| Fast Rollback | git revert + auto-sync = 30-second rollback |
⬇️ Time to Restore Service |
| Configuration as Code | No more "snowflake servers" or tribal knowledge | ⬇️ Change Failure Rate |
| Self-Healing | Drift auto-corrected, no manual fixes | ⬇️ Time to Restore Service |
| Preview Deployments | Every PR can have ephemeral environment | ⬆️ Deployment Frequency |
| Disaster Recovery | Cluster dies? Rebuild and re-sync from Git | ⬇️ Time to Restore Service |
What GitOps Costs You
| Challenge | Mitigation |
|---|---|
| Learning Curve | ArgoCD concepts (Applications, Sync Policies) require training. Mitigation: Fawkes Dojo has GitOps module |
| Git as Bottleneck | Emergency fixes must go through Git/PR process. Mitigation: "Break-glass" procedure for true emergencies, webhooks for instant sync |
| Secret Management | Can't commit secrets to Git. Mitigation: External Secrets Operator fetches from Vault (see ADR-015) |
| Sync Delays | Default 3-minute poll interval. Mitigation: Use webhooks for instant notification, or manual refresh for urgent changes |
| Initial Complexity | Setting up multi-cluster, RBAC, ApplicationSets. Mitigation: Fawkes provides bootstrap templates and onboarding guide |
The Cultural Shift
GitOps requires a mindset change:
| Old Habit | New Behavior |
|---|---|
"I'll just kubectl edit this quickly" |
"I'll open a PR with the change" |
| "Let me SSH and debug in prod" | "I'll check Git to see the config" |
| "The deployment failed, let me retry in Jenkins" | "The sync failed, let me check ArgoCD UI for the diff" |
| "What's running in production?" | "What's in the main branch?" |
This is a feature, not a bug. Forcing changes through Git creates the paper trail that auditors love and future-you will thank.
Why ArgoCD Over Alternatives
The Landscape
When Fawkes was designed, we evaluated:
- Flux CD - CNCF Graduated, excellent tool, more modular
- Spinnaker - Formerly used by Fawkes, powerful but heavyweight
- Jenkins X - Opinionated, too tightly coupled to Jenkins
- Helm alone - Not GitOps, manual processes
Why ArgoCD Won
-
The UI Advantage
-
Visual application topology (see dependencies at a glance)
- Diff view (Git vs. cluster side-by-side)
- Real-time sync status
-
Developer Experience: Devs troubleshoot faster with visuals
-
Proven at Scale
-
Used by Intuit, Red Hat, IBM, Adobe
- 15,000+ GitHub stars
-
CNCF Graduated (highest maturity)
-
App-of-Apps Pattern
-
Elegant solution for managing platform
-
Mirrors Fawkes' "platform of platforms" architecture
-
Argo Ecosystem
-
Argo Rollouts (canary deployments)
- Argo Events (event-driven automation)
- Argo Workflows (complex pipelines)
-
Integrated, not duct-taped together
-
Backstage Integration
- Official ArgoCD plugin shows deployment status in Backstage
- Developers see sync state in their service catalog
- Unified developer portal experience
What About Flux?
Flux is excellent and a valid choice. The decision was not dogmatic:
Choose ArgoCD if: You value built-in UI, visual topology, and simpler mental model Choose Flux if: You prefer CLI-first, lower resource usage, GitOps Toolkit modularity
Fawkes prioritized Developer Experience, and ArgoCD's UI is a significant UX advantage for troubleshooting and understanding system state.
Historical Context: From Spinnaker to ArgoCD
Fawkes 1.0 (2022-2023): Used Spinnaker for deployment orchestration
Why We Left Spinnaker:
- Operational Burden: 10+ microservices just to run Spinnaker itself
- Resource Hungry: 4GB+ RAM for control plane
- Configuration Complexity: Pipelines defined in UI or JSON, hard to version
- Not GitOps-Native: Push model, required cloud credentials
- Overkill: Multi-cloud deployment orchestration when we only needed Kubernetes
What We Learned: Spinnaker is phenomenal if you're deploying to VMs, cloud functions, AND Kubernetes across multiple clouds. But for a Kubernetes-focused platform, ArgoCD gives 80% of the value with 20% of the operational cost.
Practical Implications
For Developers
Q: How do I deploy my application? A: Merge your PR. ArgoCD syncs automatically.
Q: How do I roll back a bad deployment?
A: git revert <commit> and push. ArgoCD reverts in ~30 seconds.
Q: Can I still use kubectl for debugging?
A: Yes! Read-only commands are fine. But changes won't persist (ArgoCD reverts them).
For Platform Engineers
Q: How do I onboard a new service?
A: Add a directory in platform/apps/<service>/. See the onboarding guide.
Q: How do I handle secrets? A: Use External Secrets Operator to fetch from Vault. Never commit raw secrets to Git.
Q: What about emergency fixes in production?
A: Use argocd app sync --force for immediate sync, or manual kubectl with "break-glass" documentation.
For Consultants
When a client asks "Why can't we just use Jenkins for deployments?":
- Security: Show the credential diagram (push vs. pull)
- Audit Trail: Demonstrate Git commit history as deployment log
- Rollback Speed:
git revertvs. "search Jenkins for old build" - Drift Prevention: Show how GitOps auto-corrects manual changes
- DORA Alignment: GitOps directly improves all four key metrics
Objection Handling:
- "But we have Jenkins already!" → Jenkins still does CI (build/test), ArgoCD does CD (deploy). Separation of concerns.
- "GitOps is too slow!" → Webhooks give instant sync. Plus, slow and safe beats fast and fragile.
- "We need manual approval for production!" → ArgoCD supports manual sync policies. Automation doesn't mean no governance.
Related Reading
- How-To: Onboard a Service to ArgoCD
- How-To: Sync an ArgoCD Application
- ADR: ADR-003: ArgoCD for GitOps (detailed decision rationale)
- Tutorial: Module 9: GitOps with ArgoCD
- Reference: ArgoCD Best Practices
Conclusion
GitOps is not just a deployment tool—it's an operational philosophy. By making Git the single source of truth and inverting the deployment model from push to pull, we gain security, auditability, and reliability.
The shift from Spinnaker to ArgoCD wasn't about rejecting a tool; it was about right-sizing our architecture. ArgoCD gives us true GitOps in a Kubernetes-native way without the operational overhead of more general-purpose solutions.
The Golden Path: Git → ArgoCD → Production. No sidesteps, no shortcuts, no SSH into production servers at 2am.