The first time I saw Flux solve a real problem was on a Monday morning after a weekend of silent drift. Someone had run kubectl edit on a ConfigMap in staging on Friday to debug something, forgot to roll it back, and by Monday two services were misbehaving because the live config no longer matched what Git said it should be. Nobody remembered who edited what. There was no diff. The “source of truth” was whatever the API server happened to hold that morning.
That’s the drift problem, and it’s the whole reason GitOps exists. You don’t bolt on GitOps because it’s fashionable — you bolt it on because imperative kubectl apply from engineers’ laptops is an audit black hole, and because human memory is a lousy reconciliation loop.
This post walks through getting Flux running on a local Kind cluster and installing ingress-nginx through it. But I’m not writing yet another “run these commands” tutorial. Every decision below has a reason, and if you skip the reasons you’ll end up with a repo layout you regret in six months. If you want the more mature multi-environment pattern after you’ve got the basics, I wrote that up separately in From Manual Deployments to GitOps with Flux.
The Threat Model (Yes, a Getting-Started Post Needs One)
Before any commands, name what you’re defending against. These are the failure modes GitOps actually addresses:
| Threat | What GitOps does about it |
|---|---|
| Cluster state drifts from declared intent | Continuous reconciliation reverts out-of-band changes |
| Unauthorized mutations (“who changed prod?”) | Every change is a git commit with an author and a PR |
| Broken rollbacks under pressure | git revert + reconcile, same as any code rollback |
| Secrets accidentally committed | External secret management (sealed-secrets, external-secrets), scanning, never plain tokens in git |
| Long-lived bootstrap credentials leaking | Short-lived PATs scoped to a single repo, or deploy keys generated by flux bootstrap |
If a control you’re thinking about adding doesn’t answer one of these, question it. GitOps is not free — it adds a controller, a reconciliation loop, a repository to maintain. It earns its keep by killing drift and producing a git log you can actually trust.
Why Pull-Based Reconciliation Beats Push
A CI pipeline that runs kubectl apply at the end of a build is a push model. It works, but it has two structural problems:
- The cluster can drift between deploys. If someone edits a resource by hand, nothing notices until the next CI run touches that manifest.
- CI needs cluster credentials. Every build agent that can deploy is a potential attack vector with production API access.
Flux inverts both. An agent runs inside the cluster and pulls from git on an interval. The cluster only needs outbound network to git — no inbound API exposure to CI, no credentials handed out to build runners. And the reconciliation loop runs continuously, so drift gets corrected in minutes rather than whenever the next deploy happens to overlap with it.
This is what people mean when they say “pull-based GitOps.” It’s not a dogmatic preference — it’s a smaller blast radius and a tighter feedback loop on drift.
What You Need
- A Kubernetes cluster (1.26+ for a recent Flux). I’ll use Kind locally.
kubectlconfigured to talk to it.- The
fluxCLI installed. - A GitHub account and a PAT or fine-grained token (more on this below).
Spin Up a Local Cluster With Kind
If you already have a cluster, skip ahead. Otherwise, Kind gives you a throwaway multi-node cluster in ~30 seconds.
# kind.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
extraPortMappings:
- containerPort: 80
hostPort: 80
protocol: TCP
- containerPort: 443
hostPort: 443
protocol: TCP
- role: worker
- role: worker
kind create cluster --name flux-demo --config=kind.yaml
The ingress-ready=true label and port mappings are there because we’ll install ingress-nginx later and want it reachable on localhost:80/443. On a real cloud cluster you’d use a LoadBalancer service instead — I’ll note the switch when we get there.
One Kind default worth overriding on an untrusted network (coffee shop wifi, shared LAN): Kind binds its API server to 0.0.0.0 by default, which exposes the control plane to anyone who can reach your host. Add networking.apiServerAddress: 127.0.0.1 to the Cluster spec to scope it to loopback only. Safe to skip on a trusted home network, non-negotiable anywhere else.
Token Hygiene Before You Run Anything
The Flux docs give you this:
export GITHUB_TOKEN=<your token>
export GITHUB_USER=<your username>
export GITHUB_REPO=flux-infra
This is fine for a local demo on your own machine. It is not fine for anything you’d call a real environment. A few rules I hold to:
- Never commit the token. Not to
.env, not to a dotfile that might get synced, not to a shell history file that ends up in a backup. Use a password manager or a secrets tool likepass,1password-cli, orop runto inject it into your shell. - Use the narrowest scope possible. For
flux bootstrap github, you needreposcope (or, with fine-grained tokens, read/write to exactly one repository — Contents, Administration if you want it to create the repo, and Metadata). Nothing else. Noworkflow, noadmin:org, nodelete_repo. Note that fine-grained “Administration: Write” also grants repo deletion — only enable it if you need bootstrap to create the repo, and revoke immediately afterward. - Prefer short-lived tokens. Fine-grained GitHub tokens with a 7-day expiry are plenty for bootstrap. You only need the token during the bootstrap call; after that Flux uses a deploy key generated during the bootstrap, and the PAT can be revoked immediately.
- Never ship a bootstrap token to CI. Bootstrap is a one-time human action. If you’re automating it in CI, you’re probably doing something you’ll regret. Bootstrap from a workstation or a short-lived sandbox environment.
- Revoke after bootstrap, and scrub the shell. The PAT’s job is over the moment
flux bootstrapfinishes. Delete it on GitHub, thenunset GITHUB_TOKENin your shell so it doesn’t linger in process environment or get picked up by a laterenvdump. Flux will keep syncing via the deploy key it just configured.
The export pattern I actually use on a workstation:
# Pull from a secrets manager at invocation time. Never written to disk.
export GITHUB_TOKEN="$(op read 'op://Personal/github-flux-bootstrap/token')"
export GITHUB_USER="miguelpinto"
export GITHUB_REPO="flux-infra"
If op (1Password CLI) isn’t your tool, pass, bw get, or gh auth token for a throwaway gh session all work.
Bootstrap Flux
flux bootstrap github \
--owner="$GITHUB_USER" \
--repository="$GITHUB_REPO" \
--branch=main \
--path=./clusters/local \
--personal \
--private \
--read-write-key=false
What this actually does:
- Creates the repository if it doesn’t exist (requires
reposcope, or Administration write with fine-grained). - Generates an SSH deploy key, uploads the public half to the repo, and stores the private half as a Kubernetes Secret (
flux-system/flux-system). - Installs the four Flux controllers into the
flux-systemnamespace: source, kustomize, helm, notification. - Commits the controller manifests into
clusters/local/flux-system/in the repo. - Creates a
GitRepositoryandKustomizationthat tell Flux: “sync this branch/path into this cluster, forever.”
What it does not do:
- Set up any application or infrastructure beyond Flux itself.
- Configure image automation (you need
--components-extra=image-reflector-controller,image-automation-controllerfor that). - Set up RBAC for tenant isolation. You get cluster-admin by default. Tighten this before you hand the cluster to multiple teams.
- Rotate the deploy key. That’s your problem on a cadence you define.
--read-write-key=false is the default, but I pass it explicitly as documentation: a read-only deploy key is almost always what you want. Flux only needs to read from git — unless you enable image automation, which writes tag updates back. For this demo we don’t need write, so we stay read-only.
One more thing to internalise before moving on: the bootstrap you just ran wrote the deploy key’s private half into the flux-system/flux-system Secret. That Secret is now the cluster’s credential back to your git repo. Anyone with get secrets in flux-system effectively has git access — read-only today, read-write the moment you enable image automation. Worse, the same namespace holds the kustomize-controller ServiceAccount token, which is bound to cluster-admin and reconciles everything Flux applies; a token grab there is a full cluster takeover and a lever to modify what the cluster pulls next. Lock down RBAC on the flux-system namespace accordingly; this is not a Secret — or a ServiceAccount — you want in a “developers can exec into everything” cluster.
After bootstrap, clone the repo locally so you can add manifests:
git clone "git@github.com:$GITHUB_USER/$GITHUB_REPO.git"
cd "$GITHUB_REPO"
Verify the Controllers Are Healthy
flux check
flux get all -A
flux check verifies all four controllers are running. flux get all -A shows every Flux-managed resource across all namespaces and their last-reconciled revision. Get comfortable with these — they’re the equivalent of kubectl get pods for your GitOps layer.
Repository Structure: What I’d Actually Choose
This is where most tutorials hand-wave and where I’ve seen the most regret. You have essentially two axes of choice:
Axis 1: flat vs hierarchical. A flat repo dumps everything into one directory. It’s fine for one cluster and two services. It scales to nothing. Go hierarchical from day one — the cost is ten minutes of thinking, and the alternative is a migration later.
Axis 2: per-env folders vs Kustomize overlays. Per-env folders (dev/, staging/, prod/) duplicate manifests. Overlays keep a single base and patch per environment. I strongly prefer overlays, because the alternative is “which env has the fix?” debugging and three drifted copies of what should be the same manifest.
The structure I recommend:
├── clusters/
│ └── local/
│ ├── flux-system/ # Managed by flux bootstrap, don't hand-edit
│ ├── infrastructure.yaml # Points at ./infrastructure/overlays/local
│ └── apps.yaml # Points at ./apps/overlays/local, dependsOn: infrastructure
├── infrastructure/
│ ├── base/ # Reusable definitions — ingress, cert-manager, monitoring
│ │ └── ingress-nginx/
│ └── overlays/
│ └── local/ # Kind-specific patches
│ └── ingress-nginx/
└── apps/
├── base/
└── overlays/
└── local/
For this post we only fill in the infrastructure/ side. Apps come after you’ve got the muscle memory for the reconciliation loop.
Why I Don’t Use Branch-Based Promotion
A common pattern is one branch per environment: develop syncs to dev, staging to staging, main to prod. It sounds tidy. It is not.
The pain:
- Merge conflicts on every promotion. The same file diverges across branches because dev gets fixes first. Resolving conflicts on every cherry-pick is tedious and error-prone.
- No single view of “what is deployed where.” Answering “is this fix in prod yet?” requires looking at three branches.
- Accidental backward promotions. A hotfix merged to
mainthat doesn’t get propagated back tostaginganddevelopmeans your lower environments are now behind prod. The next PR from develop to main will re-revert the hotfix. I’ve seen this happen twice. It’s always ugly. - Environment-specific config has to live somewhere. With branch-based you end up with per-branch values files that diverge silently.
Path-based promotion — one branch (main), different paths per environment under clusters/ and overlays/ — gives you one timeline, one git log, one merge workflow. Promotion is a PR that copies a value from overlays/staging/values.yaml to overlays/production/values.yaml. You can see exactly what changed because it’s a diff.
For a single-cluster getting-started repo like this one, it doesn’t matter — you only have clusters/local/. But adopt the pattern now so you don’t have to untangle it later.
Install ingress-nginx Through Flux
Now the payoff. Instead of helm install, we declare the helm chart and let Flux reconcile it.
Step 1: declare the helm repository
# infrastructure/base/sources/ingress-nginx-helm.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: ingress-nginx
namespace: flux-system
spec:
interval: 1h
url: https://kubernetes.github.io/ingress-nginx
The interval is how often Flux polls the helm repo for new chart versions. One hour is fine for a helm repo — charts don’t change often, and polling too aggressively is a waste of API calls against the chart host.
This HTTPS HelmRepository trusts whatever TLS-terminated chart host answers — fine for upstream ingress-nginx on a learning cluster, not fine for production third-party charts. For production chart sources, switch to OCI (spec.type: oci, pointing at an OCI-compliant registry like GHCR or ECR) and add spec.verify with a cosign keyless or public-key reference so unsigned charts fail the pull. Same threat model as container images: you don’t want to reconcile arbitrary templated YAML that nobody signed.
# infrastructure/base/sources/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: flux-system
resources:
- ingress-nginx-helm.yaml
Step 2: declare the namespace
Helm releases land in a namespace; create it explicitly rather than relying on createNamespace: true in the HelmRelease. Explicit is easier to reason about.
# infrastructure/base/ingress-nginx/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: ingress-nginx
labels:
pod-security.kubernetes.io/enforce: baseline
The Pod Security label is habit. baseline is the minimum I’d accept for an ingress controller — it still allows the hostPort binding ingress-nginx needs, but blocks the worst privilege escalations. On an existing cluster where you’re retrofitting this, start with pod-security.kubernetes.io/warn: baseline (and/or audit) first, watch for violations in the API server logs, then flip to enforce. Going straight to enforce on a live namespace is how you break workloads you didn’t know were running as root.
Step 3: declare the HelmRelease
# infrastructure/base/ingress-nginx/release.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: ingress-nginx
namespace: ingress-nginx
spec:
releaseName: ingress-nginx
interval: 1h
chart:
spec:
chart: ingress-nginx
version: "4.11.3" # Pin the version. Never use "*" or ">=x".
sourceRef:
kind: HelmRepository
name: ingress-nginx
namespace: flux-system
interval: 12h # How often to check the HelmRepository cache
install:
remediation:
retries: 3
upgrade:
remediation:
retries: 3
remediateLastFailure: true
values:
controller:
replicaCount: 1
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
metrics:
enabled: true
admissionWebhooks:
enabled: true
failurePolicy: Fail
# Kind-specific bits get overridden in the overlay
service:
type: NodePort
A few things worth calling out:
Resource limits are not optional. I’ve seen the vanilla ingress-nginx chart deployed without limits, get OOMKilled under traffic, and take out the whole ingress path for a cluster. Requests and limits, always. Tune them to your traffic; these are a floor.
Pin the chart version. Not "*", not ">=4.0.0", not omitted. A chart upgrade you didn’t review will eventually ship a breaking change — a template rename, a values key that moved, a default that flipped. Pin, test the upgrade in a lower env, bump deliberately. Renovate or Dependabot can open the PR for you.
remediation on install and upgrade. If the release fails, Flux retries. If it still fails, without remediateLastFailure an upgrade leaves you in a “failed” state until you intervene. With it, Flux rolls back to the last good release automatically. Production-grade default.
metrics.enabled: true. Wire Prometheus to these from day one. The reason you’ll eventually care: if ingress-nginx is the bottleneck, you want the data already flowing.
Step 4: tie the base together
# infrastructure/base/ingress-nginx/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: ingress-nginx
resources:
- namespace.yaml
- release.yaml
Step 5: the local overlay
This is where Kind-specific quirks live. On Kind we want hostPort so requests on localhost:80 reach the controller. On a real cluster, you’d leave this out and use a LoadBalancer service.
# infrastructure/overlays/local/ingress-nginx/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: ingress-nginx
resources:
- ../../../base/ingress-nginx
patches:
- path: release-patch.yaml
target:
kind: HelmRelease
name: ingress-nginx
# infrastructure/overlays/local/ingress-nginx/release-patch.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: ingress-nginx
namespace: ingress-nginx
spec:
values:
controller:
hostPort:
enabled: true
service:
type: NodePort
nodeSelector:
ingress-ready: "true"
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Equal
effect: NoSchedule
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 256Mi
Note I also shrink the resource envelope for local — no need to reserve production-sized CPU on a laptop.
# infrastructure/overlays/local/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base/sources
- ingress-nginx
Step 6: the Flux Kustomization that points at all this
# clusters/local/infrastructure.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: infrastructure
namespace: flux-system
spec:
interval: 10m
path: ./infrastructure/overlays/local
prune: true
sourceRef:
kind: GitRepository
name: flux-system
healthChecks:
- apiVersion: apps/v1
kind: Deployment
name: ingress-nginx-controller
namespace: ingress-nginx
timeout: 5m
The healthChecks field is what stops Flux from reporting “applied” before the thing is actually ready. Without it you get false green. With it, downstream Kustomizations that depend on infrastructure will correctly wait.
Warning: this Kustomization inherits cluster-admin. There’s no spec.serviceAccountName set, so Flux applies these manifests using the kustomize-controller’s own ServiceAccount, which is bound to cluster-admin out of bootstrap. Safe for a single-operator learning cluster; do not copy this into a multi-tenant repo without setting spec.serviceAccountName: flux-infrastructure (and creating a scoped ServiceAccount + RoleBinding that only grants the permissions this Kustomization actually needs). The RBAC walkthrough is in the more mature GitOps post.
Commit, push, and Flux picks it up within the reconciliation interval:
git add .
git commit -m "Add ingress-nginx via Flux"
git push origin main
# Or trigger immediately:
flux reconcile source git flux-system
flux reconcile kustomization infrastructure
Reconciliation Intervals: The Tradeoff
Every Flux object has an interval. The temptation is to set everything to 1m so changes happen fast. Resist it.
- GitRepository interval (how often Flux polls git): 1-5 min is reasonable. Shorter adds load to your git host; longer delays deploys.
- HelmRepository interval (how often Flux polls the chart host): 1h is fine. Helm charts don’t change often.
- Kustomization interval (how often Flux reconciles desired → actual state): 5-10 min for most things. This is the drift-correction heartbeat. Shorter means faster drift correction but more API load on your control plane.
- HelmRelease interval (how often Flux checks if the release needs a chart upgrade): 1h.
The rule: shorter intervals on sources (git, helm) are cheap; shorter intervals on Kustomizations and HelmReleases hit the Kubernetes API server. If you’re seeing API server CPU spikes, your Kustomization intervals are probably too aggressive.
You can always trigger an immediate reconcile with flux reconcile when you need to. Don’t set intervals as if that’s your only lever.
Turn On Commit Signature Verification
Flux will happily apply whatever commits are on the branch it’s watching. If an attacker steals a PAT, exfiltrates a deploy key, or phishes a maintainer with push access, they can commit arbitrary manifests and Flux reconciles them inside the cluster. Branch protection helps, but the cryptographic fix is telling Flux to refuse commits that aren’t signed by a key you trust.
This is non-optional the moment you enable image automation (a bot now has write access) or you onboard a second committer (you can no longer eyeball every push). Turn it on for a single-operator learning cluster too — it’s ten minutes of work and it closes the largest unreviewed trust gap in the default bootstrap.
Generate or export your public signing key (GPG or SSH-signing, whichever you already use for git commit -S) and load it into the cluster:
# Export your public key (GPG example)
gpg --export --armor your-email@example.com > signing-key.pub
# Or, for SSH commit signing:
# cp ~/.ssh/id_ed25519.pub signing-key.pub
kubectl create secret generic flux-signing-keys \
--namespace=flux-system \
--from-file=your-email@example.com=signing-key.pub
# Key inside the Secret must be the signer identity; add one entry per trusted signer.
Then patch the bootstrap-generated GitRepository (in clusters/local/flux-system/gotk-sync.yaml) to require verification:
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: flux-system
namespace: flux-system
spec:
interval: 1m
url: ssh://git@github.com/miguelpinto/flux-infra
ref:
branch: main
secretRef:
name: flux-system
verify:
mode: HEAD
secretRef:
name: flux-signing-keys
mode: HEAD verifies the tip commit of the branch on every fetch; any unsigned commit (or one signed by a key not in flux-signing-keys) fails the source reconciliation and nothing downstream applies. Commit this change via PR, merge it signed, and watch flux get sources git -A — a red source is your smoke test that verification works.
One gotcha: if you enable this before your team sets up signing, everyone’s next git push breaks the cluster. Announce, set up signing locally first, then enable verification.
Image Automation: When To Use It, When Not
Flux can watch a container registry, match tags against a semver or regex policy, update the manifest in git, commit, and reconcile. Fully hands-off deploys.
This is great for lower environments. Dev and staging should track main automatically — push an image, it lands. Feedback loop tightens, PRs get reviewed faster.
This is not great for production, at least not without ceremony. Production deploys should be explicit, reviewed, and auditable beyond a bot commit. What I do:
- Image automation → dev/staging automatically.
- Prod deploys go through a PR that bumps the image tag manually (or via a promotion script that opens the PR). A human approves.
Enable it per-environment using ImageRepository + ImagePolicy + ImageUpdateAutomation — and be aware that enabling it requires the bootstrap deploy key to be read-write, because Flux writes tag updates back to git.
That upgrade from read-only to read-write is the biggest blast-radius change in all of Flux. A read-write deploy key turns a cluster compromise into a supply-chain compromise: an attacker with get secrets in flux-system now has a credential that can commit arbitrary manifests back to the source of truth, and the next reconcile applies them. A few mitigations I’d consider non-optional before flipping that switch:
- Scope the key to a single repo. Never reuse a bootstrap key across infra and app repos.
- Protect the branch Flux writes to. Branch protection + required signed commits means a key-compromise commit stands out, or fails, rather than silently shipping.
- Consider an image-automation branch, not main. Flux writes to a dedicated branch; a human PR promotes it to main. This re-introduces a review step without losing the automation benefit for lower envs.
- Tighten RBAC on
flux-system.get secretsthere is now git-write. Audit who has it. - Verify image signatures at admission time. A signed commit is only as good as the image it points at — if image-automation bumps a tag to a malicious digest pushed by a compromised CI, the signed commit just smuggles the bad image in. Enforce cosign/Sigstore signature verification via Flux’s OCI
spec.verifyonHelmRepository/OCIRepositorysources, or via KyvernoverifyImages/ Gatekeeper policy at the admission webhook.
Common Beginner Mistakes
These are the ones I see first-time Flux users hit.
prune: true deleting things you didn’t expect
prune: true means Flux deletes any resource it previously applied that no longer exists in git. This is what you want eventually — it’s the whole drift-correction story. But when you’re first learning, prune: true on a Kustomization that points at a misconfigured path will happily delete resources you didn’t mean to touch.
Start with prune: false. Flip it to true once you trust your manifest structure. It’s a one-line change and it will save you a reconcile-induced outage.
Namespace scoping surprises
A Kustomization with namespace: foo set will force all its resources into namespace foo, overriding the metadata.namespace on individual manifests. This trips people up when they set namespace: flux-system at the Kustomization level and then wonder why their app’s ConfigMap ended up in flux-system instead of apps.
Rule of thumb: don’t set namespace: on the Flux Kustomization unless you want every resource underneath it scoped to that namespace. Set metadata.namespace on each manifest and let Kustomize respect it.
RBAC: Flux is cluster-admin by default
Out of bootstrap, Flux runs as cluster-admin. Fine for a learning cluster, dangerous for multi-tenant. If you’re going to let multiple teams commit to the same Flux repo, you want tenant Kustomizations running under scoped ServiceAccounts so team A can’t accidentally (or deliberately) deploy into team B’s namespace.
This is a larger topic — see the multi-tenancy section of the more mature GitOps post — but be aware that the default is wide-open and you should tighten it before shipping anything shared.
Forgetting the HelmRelease chart interval
The HelmRelease has two intervals: spec.interval (how often to reconcile the release) and spec.chart.spec.interval (how often to check the referenced HelmRepository for a new chart version matching the version spec). Omit the inner one and you may wait longer than you expect for a chart version update to be noticed. Set both.
Admission webhooks that block reconciliation
If you install an admission webhook via Flux and the webhook fails closed, and the webhook pod itself isn’t running yet, you can wedge the cluster — the webhook blocks its own deployment. Solve this with failurePolicy: Ignore on the webhook or exclude the webhook’s namespace from the webhook’s own selector. Learned this one the hard way.
(Separate concept: Flux’s own notification-controller Receiver — the inbound webhook endpoint that lets GitHub/GitLab/Harbor poke Flux to reconcile immediately — must use spec.type: generic-hmac (not plain generic, which is unauthenticated and will accept any POST from the internet). The HMAC Secret it references needs stringData.token set to at least 32 random bytes (e.g., openssl rand -hex 32); the notification-controller verifies the X-Signature header against that token on every request. Not in scope for this getting-started post, but if you wire one up, don’t expose an unauthenticated /hook endpoint; anyone on the internet can trigger reconciles and amplify your polling into a DoS vector.)
Flux vs ArgoCD: What I’d Actually Choose
The honest answer: pick either and move on. Both are solid. But here’s how I’d decide.
Pick Flux if you want GitOps as infrastructure — controllers in the cluster, configured via CRDs, no extra UI to run. It composes cleanly with Kustomize and Helm. Operators love it. It’s a CNCF graduated project. If your team is comfortable in YAML and kubectl, Flux disappears into the cluster and works.
Pick ArgoCD if you want a UI that non-operators can use to see what’s deployed where, roll back visually, and trigger syncs. ArgoCD’s app-of-apps pattern is nice for managing dozens of apps. It’s opinionated about Application CRDs and has a stronger UX for people who aren’t going to kubectl into the cluster.
For a small ops team managing infrastructure and a handful of apps, Flux. For a larger org with developers who need visibility without cluster access, ArgoCD. Don’t agonize — either is a better answer than no GitOps.
One concrete behavioural difference worth knowing: ArgoCD separates sync from self-heal — you can detect drift without auto-correcting, and decide per-Application whether out-of-band edits get reverted. Flux always corrects drift on reconcile; there is no detect-only mode on a Kustomization. If you want to inspect drift without Flux silently undoing it, suspend the Kustomization (flux suspend kustomization <name>) and use flux diff to compare. Fine once you know it; surprising if you don’t.
When to Graduate Past Flux-the-Tool
Flux is a reconciliation engine. It’s not the whole GitOps practice. Once you’ve got the loop working, the next questions are:
- Secrets management. Plain Kubernetes Secrets in git are base64, which is not encryption. Use
sealed-secrets,external-secrets-operatorwith a backing secret manager (Vault, AWS Secrets Manager, GCP Secret Manager), or SOPS-encrypted manifests. Pick one and make it the standard before anyone tries to commit a plain secret. - Policy. Kyverno or OPA/Gatekeeper enforce rules at admission time (required labels, no
:latesttags, no privileged containers). GitOps reconciles what’s in git; policy enforces what can be in git. - Progressive delivery. Flagger (which pairs with Flux) does canary and blue/green rollouts driven by metrics. Plain Flux does “apply and hope.” For anything user-facing, you eventually want progressive delivery.
- Multi-cluster fleet management. When you have more than three clusters, managing each one’s Flux bootstrap by hand gets old. Look at Flux’s fleet patterns or tools like Cluster API + Flux for scalable cluster provisioning.
These are all “after you’ve got Flux humming” concerns. Don’t try to solve them in your first week.
Conclusion
Getting Flux running locally is a couple of commands. Getting the repo structure, token hygiene, pruning behavior, and reconciliation intervals right is where the actual work lives. The value isn’t in having a controller running — it’s in having an audit trail, a reconciliation loop that corrects drift, and a single workflow (PR → merge → deploy) for every change.
The drift problem that kicked off this post? It doesn’t go away because you installed Flux. It goes away because the team commits to never running kubectl edit on a live cluster again, and because the reconciliation loop catches them when they do. The tool is half of it. The discipline is the other half.
If you want the next level — multi-environment promotion, image automation for real, policy enforcement, monitoring the reconciliation pipeline itself — read From Manual Deployments to GitOps with Flux. It picks up roughly where this one ends.