Getting Started with Flux and GitOps

A practitioner's quickstart for Flux — bootstrap, repo layout, token hygiene, and the beginner mistakes I see most often.

Getting startedDevOpsKubernetes

The first time I saw Flux solve a real problem was on a Monday morning after a weekend of silent drift. Someone had run kubectl edit on a ConfigMap in staging on Friday to debug something, forgot to roll it back, and by Monday two services were misbehaving because the live config no longer matched what Git said it should be. Nobody remembered who edited what. There was no diff. The “source of truth” was whatever the API server happened to hold that morning.

That’s the drift problem, and it’s the whole reason GitOps exists. You don’t bolt on GitOps because it’s fashionable — you bolt it on because imperative kubectl apply from engineers’ laptops is an audit black hole, and because human memory is a lousy reconciliation loop.

This post walks through getting Flux running on a local Kind cluster and installing ingress-nginx through it. But I’m not writing yet another “run these commands” tutorial. Every decision below has a reason, and if you skip the reasons you’ll end up with a repo layout you regret in six months. If you want the more mature multi-environment pattern after you’ve got the basics, I wrote that up separately in From Manual Deployments to GitOps with Flux.

The Threat Model (Yes, a Getting-Started Post Needs One)

Before any commands, name what you’re defending against. These are the failure modes GitOps actually addresses:

ThreatWhat GitOps does about it
Cluster state drifts from declared intentContinuous reconciliation reverts out-of-band changes
Unauthorized mutations (“who changed prod?”)Every change is a git commit with an author and a PR
Broken rollbacks under pressuregit revert + reconcile, same as any code rollback
Secrets accidentally committedExternal secret management (sealed-secrets, external-secrets), scanning, never plain tokens in git
Long-lived bootstrap credentials leakingShort-lived PATs scoped to a single repo, or deploy keys generated by flux bootstrap

If a control you’re thinking about adding doesn’t answer one of these, question it. GitOps is not free — it adds a controller, a reconciliation loop, a repository to maintain. It earns its keep by killing drift and producing a git log you can actually trust.

Why Pull-Based Reconciliation Beats Push

A CI pipeline that runs kubectl apply at the end of a build is a push model. It works, but it has two structural problems:

  1. The cluster can drift between deploys. If someone edits a resource by hand, nothing notices until the next CI run touches that manifest.
  2. CI needs cluster credentials. Every build agent that can deploy is a potential attack vector with production API access.

Flux inverts both. An agent runs inside the cluster and pulls from git on an interval. The cluster only needs outbound network to git — no inbound API exposure to CI, no credentials handed out to build runners. And the reconciliation loop runs continuously, so drift gets corrected in minutes rather than whenever the next deploy happens to overlap with it.

This is what people mean when they say “pull-based GitOps.” It’s not a dogmatic preference — it’s a smaller blast radius and a tighter feedback loop on drift.

What You Need

  • A Kubernetes cluster (1.26+ for a recent Flux). I’ll use Kind locally.
  • kubectl configured to talk to it.
  • The flux CLI installed.
  • A GitHub account and a PAT or fine-grained token (more on this below).

Spin Up a Local Cluster With Kind

If you already have a cluster, skip ahead. Otherwise, Kind gives you a throwaway multi-node cluster in ~30 seconds.

# kind.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
    kubeadmConfigPatches:
      - |
        kind: InitConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            node-labels: "ingress-ready=true"
    extraPortMappings:
      - containerPort: 80
        hostPort: 80
        protocol: TCP
      - containerPort: 443
        hostPort: 443
        protocol: TCP
  - role: worker
  - role: worker
kind create cluster --name flux-demo --config=kind.yaml

The ingress-ready=true label and port mappings are there because we’ll install ingress-nginx later and want it reachable on localhost:80/443. On a real cloud cluster you’d use a LoadBalancer service instead — I’ll note the switch when we get there.

One Kind default worth overriding on an untrusted network (coffee shop wifi, shared LAN): Kind binds its API server to 0.0.0.0 by default, which exposes the control plane to anyone who can reach your host. Add networking.apiServerAddress: 127.0.0.1 to the Cluster spec to scope it to loopback only. Safe to skip on a trusted home network, non-negotiable anywhere else.

Token Hygiene Before You Run Anything

The Flux docs give you this:

export GITHUB_TOKEN=<your token>
export GITHUB_USER=<your username>
export GITHUB_REPO=flux-infra

This is fine for a local demo on your own machine. It is not fine for anything you’d call a real environment. A few rules I hold to:

  • Never commit the token. Not to .env, not to a dotfile that might get synced, not to a shell history file that ends up in a backup. Use a password manager or a secrets tool like pass, 1password-cli, or op run to inject it into your shell.
  • Use the narrowest scope possible. For flux bootstrap github, you need repo scope (or, with fine-grained tokens, read/write to exactly one repository — Contents, Administration if you want it to create the repo, and Metadata). Nothing else. No workflow, no admin:org, no delete_repo. Note that fine-grained “Administration: Write” also grants repo deletion — only enable it if you need bootstrap to create the repo, and revoke immediately afterward.
  • Prefer short-lived tokens. Fine-grained GitHub tokens with a 7-day expiry are plenty for bootstrap. You only need the token during the bootstrap call; after that Flux uses a deploy key generated during the bootstrap, and the PAT can be revoked immediately.
  • Never ship a bootstrap token to CI. Bootstrap is a one-time human action. If you’re automating it in CI, you’re probably doing something you’ll regret. Bootstrap from a workstation or a short-lived sandbox environment.
  • Revoke after bootstrap, and scrub the shell. The PAT’s job is over the moment flux bootstrap finishes. Delete it on GitHub, then unset GITHUB_TOKEN in your shell so it doesn’t linger in process environment or get picked up by a later env dump. Flux will keep syncing via the deploy key it just configured.

The export pattern I actually use on a workstation:

# Pull from a secrets manager at invocation time. Never written to disk.
export GITHUB_TOKEN="$(op read 'op://Personal/github-flux-bootstrap/token')"
export GITHUB_USER="miguelpinto"
export GITHUB_REPO="flux-infra"

If op (1Password CLI) isn’t your tool, pass, bw get, or gh auth token for a throwaway gh session all work.

Bootstrap Flux

flux bootstrap github \
  --owner="$GITHUB_USER" \
  --repository="$GITHUB_REPO" \
  --branch=main \
  --path=./clusters/local \
  --personal \
  --private \
  --read-write-key=false

What this actually does:

  1. Creates the repository if it doesn’t exist (requires repo scope, or Administration write with fine-grained).
  2. Generates an SSH deploy key, uploads the public half to the repo, and stores the private half as a Kubernetes Secret (flux-system/flux-system).
  3. Installs the four Flux controllers into the flux-system namespace: source, kustomize, helm, notification.
  4. Commits the controller manifests into clusters/local/flux-system/ in the repo.
  5. Creates a GitRepository and Kustomization that tell Flux: “sync this branch/path into this cluster, forever.”

What it does not do:

  • Set up any application or infrastructure beyond Flux itself.
  • Configure image automation (you need --components-extra=image-reflector-controller,image-automation-controller for that).
  • Set up RBAC for tenant isolation. You get cluster-admin by default. Tighten this before you hand the cluster to multiple teams.
  • Rotate the deploy key. That’s your problem on a cadence you define.

--read-write-key=false is the default, but I pass it explicitly as documentation: a read-only deploy key is almost always what you want. Flux only needs to read from git — unless you enable image automation, which writes tag updates back. For this demo we don’t need write, so we stay read-only.

One more thing to internalise before moving on: the bootstrap you just ran wrote the deploy key’s private half into the flux-system/flux-system Secret. That Secret is now the cluster’s credential back to your git repo. Anyone with get secrets in flux-system effectively has git access — read-only today, read-write the moment you enable image automation. Worse, the same namespace holds the kustomize-controller ServiceAccount token, which is bound to cluster-admin and reconciles everything Flux applies; a token grab there is a full cluster takeover and a lever to modify what the cluster pulls next. Lock down RBAC on the flux-system namespace accordingly; this is not a Secret — or a ServiceAccount — you want in a “developers can exec into everything” cluster.

After bootstrap, clone the repo locally so you can add manifests:

git clone "git@github.com:$GITHUB_USER/$GITHUB_REPO.git"
cd "$GITHUB_REPO"

Verify the Controllers Are Healthy

flux check
flux get all -A

flux check verifies all four controllers are running. flux get all -A shows every Flux-managed resource across all namespaces and their last-reconciled revision. Get comfortable with these — they’re the equivalent of kubectl get pods for your GitOps layer.

Repository Structure: What I’d Actually Choose

This is where most tutorials hand-wave and where I’ve seen the most regret. You have essentially two axes of choice:

Axis 1: flat vs hierarchical. A flat repo dumps everything into one directory. It’s fine for one cluster and two services. It scales to nothing. Go hierarchical from day one — the cost is ten minutes of thinking, and the alternative is a migration later.

Axis 2: per-env folders vs Kustomize overlays. Per-env folders (dev/, staging/, prod/) duplicate manifests. Overlays keep a single base and patch per environment. I strongly prefer overlays, because the alternative is “which env has the fix?” debugging and three drifted copies of what should be the same manifest.

The structure I recommend:

├── clusters/
│   └── local/
│       ├── flux-system/           # Managed by flux bootstrap, don't hand-edit
│       ├── infrastructure.yaml    # Points at ./infrastructure/overlays/local
│       └── apps.yaml              # Points at ./apps/overlays/local, dependsOn: infrastructure
├── infrastructure/
│   ├── base/                      # Reusable definitions — ingress, cert-manager, monitoring
│   │   └── ingress-nginx/
│   └── overlays/
│       └── local/                 # Kind-specific patches
│           └── ingress-nginx/
└── apps/
    ├── base/
    └── overlays/
        └── local/

For this post we only fill in the infrastructure/ side. Apps come after you’ve got the muscle memory for the reconciliation loop.

Why I Don’t Use Branch-Based Promotion

A common pattern is one branch per environment: develop syncs to dev, staging to staging, main to prod. It sounds tidy. It is not.

The pain:

  • Merge conflicts on every promotion. The same file diverges across branches because dev gets fixes first. Resolving conflicts on every cherry-pick is tedious and error-prone.
  • No single view of “what is deployed where.” Answering “is this fix in prod yet?” requires looking at three branches.
  • Accidental backward promotions. A hotfix merged to main that doesn’t get propagated back to staging and develop means your lower environments are now behind prod. The next PR from develop to main will re-revert the hotfix. I’ve seen this happen twice. It’s always ugly.
  • Environment-specific config has to live somewhere. With branch-based you end up with per-branch values files that diverge silently.

Path-based promotion — one branch (main), different paths per environment under clusters/ and overlays/ — gives you one timeline, one git log, one merge workflow. Promotion is a PR that copies a value from overlays/staging/values.yaml to overlays/production/values.yaml. You can see exactly what changed because it’s a diff.

For a single-cluster getting-started repo like this one, it doesn’t matter — you only have clusters/local/. But adopt the pattern now so you don’t have to untangle it later.

Install ingress-nginx Through Flux

Now the payoff. Instead of helm install, we declare the helm chart and let Flux reconcile it.

Step 1: declare the helm repository

# infrastructure/base/sources/ingress-nginx-helm.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
  name: ingress-nginx
  namespace: flux-system
spec:
  interval: 1h
  url: https://kubernetes.github.io/ingress-nginx

The interval is how often Flux polls the helm repo for new chart versions. One hour is fine for a helm repo — charts don’t change often, and polling too aggressively is a waste of API calls against the chart host.

This HTTPS HelmRepository trusts whatever TLS-terminated chart host answers — fine for upstream ingress-nginx on a learning cluster, not fine for production third-party charts. For production chart sources, switch to OCI (spec.type: oci, pointing at an OCI-compliant registry like GHCR or ECR) and add spec.verify with a cosign keyless or public-key reference so unsigned charts fail the pull. Same threat model as container images: you don’t want to reconcile arbitrary templated YAML that nobody signed.

# infrastructure/base/sources/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: flux-system
resources:
  - ingress-nginx-helm.yaml

Step 2: declare the namespace

Helm releases land in a namespace; create it explicitly rather than relying on createNamespace: true in the HelmRelease. Explicit is easier to reason about.

# infrastructure/base/ingress-nginx/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: ingress-nginx
  labels:
    pod-security.kubernetes.io/enforce: baseline

The Pod Security label is habit. baseline is the minimum I’d accept for an ingress controller — it still allows the hostPort binding ingress-nginx needs, but blocks the worst privilege escalations. On an existing cluster where you’re retrofitting this, start with pod-security.kubernetes.io/warn: baseline (and/or audit) first, watch for violations in the API server logs, then flip to enforce. Going straight to enforce on a live namespace is how you break workloads you didn’t know were running as root.

Step 3: declare the HelmRelease

# infrastructure/base/ingress-nginx/release.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
spec:
  releaseName: ingress-nginx
  interval: 1h
  chart:
    spec:
      chart: ingress-nginx
      version: "4.11.3"            # Pin the version. Never use "*" or ">=x".
      sourceRef:
        kind: HelmRepository
        name: ingress-nginx
        namespace: flux-system
      interval: 12h                # How often to check the HelmRepository cache
  install:
    remediation:
      retries: 3
  upgrade:
    remediation:
      retries: 3
      remediateLastFailure: true
  values:
    controller:
      replicaCount: 1
      resources:
        requests:
          cpu: 100m
          memory: 128Mi
        limits:
          cpu: 500m
          memory: 512Mi
      metrics:
        enabled: true
      admissionWebhooks:
        enabled: true
        failurePolicy: Fail
      # Kind-specific bits get overridden in the overlay
      service:
        type: NodePort

A few things worth calling out:

Resource limits are not optional. I’ve seen the vanilla ingress-nginx chart deployed without limits, get OOMKilled under traffic, and take out the whole ingress path for a cluster. Requests and limits, always. Tune them to your traffic; these are a floor.

Pin the chart version. Not "*", not ">=4.0.0", not omitted. A chart upgrade you didn’t review will eventually ship a breaking change — a template rename, a values key that moved, a default that flipped. Pin, test the upgrade in a lower env, bump deliberately. Renovate or Dependabot can open the PR for you.

remediation on install and upgrade. If the release fails, Flux retries. If it still fails, without remediateLastFailure an upgrade leaves you in a “failed” state until you intervene. With it, Flux rolls back to the last good release automatically. Production-grade default.

metrics.enabled: true. Wire Prometheus to these from day one. The reason you’ll eventually care: if ingress-nginx is the bottleneck, you want the data already flowing.

Step 4: tie the base together

# infrastructure/base/ingress-nginx/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: ingress-nginx
resources:
  - namespace.yaml
  - release.yaml

Step 5: the local overlay

This is where Kind-specific quirks live. On Kind we want hostPort so requests on localhost:80 reach the controller. On a real cluster, you’d leave this out and use a LoadBalancer service.

# infrastructure/overlays/local/ingress-nginx/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: ingress-nginx
resources:
  - ../../../base/ingress-nginx
patches:
  - path: release-patch.yaml
    target:
      kind: HelmRelease
      name: ingress-nginx
# infrastructure/overlays/local/ingress-nginx/release-patch.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
spec:
  values:
    controller:
      hostPort:
        enabled: true
      service:
        type: NodePort
      nodeSelector:
        ingress-ready: "true"
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          operator: Equal
          effect: NoSchedule
      resources:
        requests:
          cpu: 50m
          memory: 64Mi
        limits:
          cpu: 200m
          memory: 256Mi

Note I also shrink the resource envelope for local — no need to reserve production-sized CPU on a laptop.

# infrastructure/overlays/local/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base/sources
  - ingress-nginx

Step 6: the Flux Kustomization that points at all this

# clusters/local/infrastructure.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: infrastructure
  namespace: flux-system
spec:
  interval: 10m
  path: ./infrastructure/overlays/local
  prune: true
  sourceRef:
    kind: GitRepository
    name: flux-system
  healthChecks:
    - apiVersion: apps/v1
      kind: Deployment
      name: ingress-nginx-controller
      namespace: ingress-nginx
  timeout: 5m

The healthChecks field is what stops Flux from reporting “applied” before the thing is actually ready. Without it you get false green. With it, downstream Kustomizations that depend on infrastructure will correctly wait.

Warning: this Kustomization inherits cluster-admin. There’s no spec.serviceAccountName set, so Flux applies these manifests using the kustomize-controller’s own ServiceAccount, which is bound to cluster-admin out of bootstrap. Safe for a single-operator learning cluster; do not copy this into a multi-tenant repo without setting spec.serviceAccountName: flux-infrastructure (and creating a scoped ServiceAccount + RoleBinding that only grants the permissions this Kustomization actually needs). The RBAC walkthrough is in the more mature GitOps post.

Commit, push, and Flux picks it up within the reconciliation interval:

git add .
git commit -m "Add ingress-nginx via Flux"
git push origin main

# Or trigger immediately:
flux reconcile source git flux-system
flux reconcile kustomization infrastructure

Reconciliation Intervals: The Tradeoff

Every Flux object has an interval. The temptation is to set everything to 1m so changes happen fast. Resist it.

  • GitRepository interval (how often Flux polls git): 1-5 min is reasonable. Shorter adds load to your git host; longer delays deploys.
  • HelmRepository interval (how often Flux polls the chart host): 1h is fine. Helm charts don’t change often.
  • Kustomization interval (how often Flux reconciles desired → actual state): 5-10 min for most things. This is the drift-correction heartbeat. Shorter means faster drift correction but more API load on your control plane.
  • HelmRelease interval (how often Flux checks if the release needs a chart upgrade): 1h.

The rule: shorter intervals on sources (git, helm) are cheap; shorter intervals on Kustomizations and HelmReleases hit the Kubernetes API server. If you’re seeing API server CPU spikes, your Kustomization intervals are probably too aggressive.

You can always trigger an immediate reconcile with flux reconcile when you need to. Don’t set intervals as if that’s your only lever.

Turn On Commit Signature Verification

Flux will happily apply whatever commits are on the branch it’s watching. If an attacker steals a PAT, exfiltrates a deploy key, or phishes a maintainer with push access, they can commit arbitrary manifests and Flux reconciles them inside the cluster. Branch protection helps, but the cryptographic fix is telling Flux to refuse commits that aren’t signed by a key you trust.

This is non-optional the moment you enable image automation (a bot now has write access) or you onboard a second committer (you can no longer eyeball every push). Turn it on for a single-operator learning cluster too — it’s ten minutes of work and it closes the largest unreviewed trust gap in the default bootstrap.

Generate or export your public signing key (GPG or SSH-signing, whichever you already use for git commit -S) and load it into the cluster:

# Export your public key (GPG example)
gpg --export --armor your-email@example.com > signing-key.pub

# Or, for SSH commit signing:
# cp ~/.ssh/id_ed25519.pub signing-key.pub

kubectl create secret generic flux-signing-keys \
  --namespace=flux-system \
  --from-file=your-email@example.com=signing-key.pub
# Key inside the Secret must be the signer identity; add one entry per trusted signer.

Then patch the bootstrap-generated GitRepository (in clusters/local/flux-system/gotk-sync.yaml) to require verification:

apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: flux-system
  namespace: flux-system
spec:
  interval: 1m
  url: ssh://git@github.com/miguelpinto/flux-infra
  ref:
    branch: main
  secretRef:
    name: flux-system
  verify:
    mode: HEAD
    secretRef:
      name: flux-signing-keys

mode: HEAD verifies the tip commit of the branch on every fetch; any unsigned commit (or one signed by a key not in flux-signing-keys) fails the source reconciliation and nothing downstream applies. Commit this change via PR, merge it signed, and watch flux get sources git -A — a red source is your smoke test that verification works.

One gotcha: if you enable this before your team sets up signing, everyone’s next git push breaks the cluster. Announce, set up signing locally first, then enable verification.

Image Automation: When To Use It, When Not

Flux can watch a container registry, match tags against a semver or regex policy, update the manifest in git, commit, and reconcile. Fully hands-off deploys.

This is great for lower environments. Dev and staging should track main automatically — push an image, it lands. Feedback loop tightens, PRs get reviewed faster.

This is not great for production, at least not without ceremony. Production deploys should be explicit, reviewed, and auditable beyond a bot commit. What I do:

  • Image automation → dev/staging automatically.
  • Prod deploys go through a PR that bumps the image tag manually (or via a promotion script that opens the PR). A human approves.

Enable it per-environment using ImageRepository + ImagePolicy + ImageUpdateAutomation — and be aware that enabling it requires the bootstrap deploy key to be read-write, because Flux writes tag updates back to git.

That upgrade from read-only to read-write is the biggest blast-radius change in all of Flux. A read-write deploy key turns a cluster compromise into a supply-chain compromise: an attacker with get secrets in flux-system now has a credential that can commit arbitrary manifests back to the source of truth, and the next reconcile applies them. A few mitigations I’d consider non-optional before flipping that switch:

  • Scope the key to a single repo. Never reuse a bootstrap key across infra and app repos.
  • Protect the branch Flux writes to. Branch protection + required signed commits means a key-compromise commit stands out, or fails, rather than silently shipping.
  • Consider an image-automation branch, not main. Flux writes to a dedicated branch; a human PR promotes it to main. This re-introduces a review step without losing the automation benefit for lower envs.
  • Tighten RBAC on flux-system. get secrets there is now git-write. Audit who has it.
  • Verify image signatures at admission time. A signed commit is only as good as the image it points at — if image-automation bumps a tag to a malicious digest pushed by a compromised CI, the signed commit just smuggles the bad image in. Enforce cosign/Sigstore signature verification via Flux’s OCI spec.verify on HelmRepository/OCIRepository sources, or via Kyverno verifyImages / Gatekeeper policy at the admission webhook.

Common Beginner Mistakes

These are the ones I see first-time Flux users hit.

prune: true deleting things you didn’t expect

prune: true means Flux deletes any resource it previously applied that no longer exists in git. This is what you want eventually — it’s the whole drift-correction story. But when you’re first learning, prune: true on a Kustomization that points at a misconfigured path will happily delete resources you didn’t mean to touch.

Start with prune: false. Flip it to true once you trust your manifest structure. It’s a one-line change and it will save you a reconcile-induced outage.

Namespace scoping surprises

A Kustomization with namespace: foo set will force all its resources into namespace foo, overriding the metadata.namespace on individual manifests. This trips people up when they set namespace: flux-system at the Kustomization level and then wonder why their app’s ConfigMap ended up in flux-system instead of apps.

Rule of thumb: don’t set namespace: on the Flux Kustomization unless you want every resource underneath it scoped to that namespace. Set metadata.namespace on each manifest and let Kustomize respect it.

RBAC: Flux is cluster-admin by default

Out of bootstrap, Flux runs as cluster-admin. Fine for a learning cluster, dangerous for multi-tenant. If you’re going to let multiple teams commit to the same Flux repo, you want tenant Kustomizations running under scoped ServiceAccounts so team A can’t accidentally (or deliberately) deploy into team B’s namespace.

This is a larger topic — see the multi-tenancy section of the more mature GitOps post — but be aware that the default is wide-open and you should tighten it before shipping anything shared.

Forgetting the HelmRelease chart interval

The HelmRelease has two intervals: spec.interval (how often to reconcile the release) and spec.chart.spec.interval (how often to check the referenced HelmRepository for a new chart version matching the version spec). Omit the inner one and you may wait longer than you expect for a chart version update to be noticed. Set both.

Admission webhooks that block reconciliation

If you install an admission webhook via Flux and the webhook fails closed, and the webhook pod itself isn’t running yet, you can wedge the cluster — the webhook blocks its own deployment. Solve this with failurePolicy: Ignore on the webhook or exclude the webhook’s namespace from the webhook’s own selector. Learned this one the hard way.

(Separate concept: Flux’s own notification-controller Receiver — the inbound webhook endpoint that lets GitHub/GitLab/Harbor poke Flux to reconcile immediately — must use spec.type: generic-hmac (not plain generic, which is unauthenticated and will accept any POST from the internet). The HMAC Secret it references needs stringData.token set to at least 32 random bytes (e.g., openssl rand -hex 32); the notification-controller verifies the X-Signature header against that token on every request. Not in scope for this getting-started post, but if you wire one up, don’t expose an unauthenticated /hook endpoint; anyone on the internet can trigger reconciles and amplify your polling into a DoS vector.)

Flux vs ArgoCD: What I’d Actually Choose

The honest answer: pick either and move on. Both are solid. But here’s how I’d decide.

Pick Flux if you want GitOps as infrastructure — controllers in the cluster, configured via CRDs, no extra UI to run. It composes cleanly with Kustomize and Helm. Operators love it. It’s a CNCF graduated project. If your team is comfortable in YAML and kubectl, Flux disappears into the cluster and works.

Pick ArgoCD if you want a UI that non-operators can use to see what’s deployed where, roll back visually, and trigger syncs. ArgoCD’s app-of-apps pattern is nice for managing dozens of apps. It’s opinionated about Application CRDs and has a stronger UX for people who aren’t going to kubectl into the cluster.

For a small ops team managing infrastructure and a handful of apps, Flux. For a larger org with developers who need visibility without cluster access, ArgoCD. Don’t agonize — either is a better answer than no GitOps.

One concrete behavioural difference worth knowing: ArgoCD separates sync from self-heal — you can detect drift without auto-correcting, and decide per-Application whether out-of-band edits get reverted. Flux always corrects drift on reconcile; there is no detect-only mode on a Kustomization. If you want to inspect drift without Flux silently undoing it, suspend the Kustomization (flux suspend kustomization <name>) and use flux diff to compare. Fine once you know it; surprising if you don’t.

When to Graduate Past Flux-the-Tool

Flux is a reconciliation engine. It’s not the whole GitOps practice. Once you’ve got the loop working, the next questions are:

  • Secrets management. Plain Kubernetes Secrets in git are base64, which is not encryption. Use sealed-secrets, external-secrets-operator with a backing secret manager (Vault, AWS Secrets Manager, GCP Secret Manager), or SOPS-encrypted manifests. Pick one and make it the standard before anyone tries to commit a plain secret.
  • Policy. Kyverno or OPA/Gatekeeper enforce rules at admission time (required labels, no :latest tags, no privileged containers). GitOps reconciles what’s in git; policy enforces what can be in git.
  • Progressive delivery. Flagger (which pairs with Flux) does canary and blue/green rollouts driven by metrics. Plain Flux does “apply and hope.” For anything user-facing, you eventually want progressive delivery.
  • Multi-cluster fleet management. When you have more than three clusters, managing each one’s Flux bootstrap by hand gets old. Look at Flux’s fleet patterns or tools like Cluster API + Flux for scalable cluster provisioning.

These are all “after you’ve got Flux humming” concerns. Don’t try to solve them in your first week.

Conclusion

Getting Flux running locally is a couple of commands. Getting the repo structure, token hygiene, pruning behavior, and reconciliation intervals right is where the actual work lives. The value isn’t in having a controller running — it’s in having an audit trail, a reconciliation loop that corrects drift, and a single workflow (PR → merge → deploy) for every change.

The drift problem that kicked off this post? It doesn’t go away because you installed Flux. It goes away because the team commits to never running kubectl edit on a live cluster again, and because the reconciliation loop catches them when they do. The tool is half of it. The discipline is the other half.

If you want the next level — multi-environment promotion, image automation for real, policy enforcement, monitoring the reconciliation pipeline itself — read From Manual Deployments to GitOps with Flux. It picks up roughly where this one ends.

← Back to blog