Sync Phases and Hooks: Run Your Own Code Around a Sync
What Happens if a Hook Fails
Where a hook fails tells you what state your cluster is in.
A failed PreSync hook stops the operation before the Sync phase starts, so none of your manifests apply and the old version keeps running. That is the safe outcome, and the reason a database migration belongs in PreSync.
A failed Sync is the messy case: some resources applied and some did not, leaving the cluster matching neither the old Git revision nor the new one.
A failed PostSync runs after the deploy already happened, so your manifests are applied, the pods are live and serving traffic, and the operation is still marked Failed: the hook detected a problem but did not undo it.
A failed SyncFail changes nothing, since SyncFail runs only when the operation has already failed; the operation stays Failed, but the cleanup that hook was meant to do is silently lost.
Across all four, Argo CD ends the operation as Failed, runs your SyncFail hooks if you defined any, and with automated sync on, retries on the next reconcile, which recovers a transient failure and loops on a real one until you fix it.
Hook Anti-Patterns
Hooks are imperative actions bolted onto a declarative system, so every hook is state you now operate. Most hook incidents are really "this should not have been a hook". Some of the most important traps are explained here (we've seen some):
The Default Delete Policy Is BeforeHookCreation, Not "None"
With no argocd.argoproj.io/hook-delete-policy, Argo CD applies BeforeHookCreation: it deletes the previous hook instance just before the next sync recreates it. That bounds accumulation to the latest run, so you do not get fifty stale migration Jobs by default. What it does leave behind is the most recent run, including a failed one, sitting in the namespace until the next sync clears it, and it never deletes anything on success. If you want a clean namespace right after a hook succeeds (smoke tests, notifications), set HookSucceeded explicitly. Be deliberate about the policy instead of assuming "no annotation" means "no cleanup" or "keep everything."
Static Hook Names That Survive Into the Next Sync
A named hook is fine when it gets deleted between runs, which the default BeforeHookCreation handles for you. The trap is a static metadata.name paired with a policy that can leave the object behind. The classic case is a named Job with hook-delete-policy: HookSucceeded that fails: HookSucceeded only deletes on success, so the failed Job persists, and the next sync tries to apply the same name over an existing Job whose spec.template is immutable. That conflict wedges the sync. Use generateName: db-migrate- so every run is a fresh object and the problem cannot occur.
metadata:
generateName: db-migrate- # fresh object each sync, no name collision
annotations:
argocd.argoproj.io/hook: PreSync
argocd.argoproj.io/hook-delete-policy: HookSucceeded
Forgetting That Self-Heal Reruns Your Hooks
Hooks run as part of a sync operation, and self-heal turns detected drift into a sync. So on an app with automated and selfHeal on, every drift correction re-executes the full phase sequence, including your PostSync. Self-heal is rate-limited (the gap between self-heal syncs defaults to 5 seconds) and will not reattempt a sync that already failed against the same commit, so it is throttled, not instant.
A cheap smoke test rerunning on each reconcile is fine and arguably useful, but an expensive hook (a full data validation, a slow external call) runs far more often than the deploy count suggests. Budget for that, or move the expensive work out of the hook.
Watch for a drift loop driving the reruns. If a mutating webhook keeps injecting a field that self-heal keeps reverting, your hooks fire on every cycle for no real change. Fix the loop with ignoreDifferences
GitOps the Hard Way, with Argo CD
Build Real GitOps Pipelines From Empty Clusters to Automated DeploysEnroll now to unlock all content and receive all future updates for free.
