ArgoCD Issues

Before You Change Anything

Recovery steps can affect shared environments. Prefer reversible or inspect-only steps first, then escalate to stronger actions only when the evidence supports it.

Use this page when ArgoCD is the thing that is broken, not just the workload it is deploying.

Before running a fix, make sure the symptom matches. Some commands here are safe retries, while others remove ArgoCD objects or force cleanup.

1. Application Stuck in "Unknown" Status

Problem: Application shows Unknown sync status. It does not sync.

Cause: The repo-server was OOMKilled or restarted. ArgoCD cached the error and does not retry automatically.

What this means in plain language: ArgoCD is holding on to stale state instead of re-reading the repo and trying again.

Risk level: Start with the first command. The delete path is more disruptive.

Fix: Hard-refresh the application. That tells ArgoCD to throw away cached state and re-read Git:

kubectl patch application <name> -n argocd --type merge \
  -p '{"metadata":{"annotations":{"argocd.argoproj.io/refresh":"hard"}}}'

If unresolved, remove the finalizer and delete the Application. The root app crawbl-apps recreates it within 3 minutes:

kubectl patch application <name> -n argocd --type json \
  -p '[{"op":"remove","path":"/metadata/finalizers"}]'
kubectl delete application <name> -n argocd

2. Repo-Server OOMKilled

Problem: Repo-server shows OOMKilled or high restart count. Applications show Unknown.

Cause: Large Helm charts (Bitnami PostgreSQL/Redis are 15-18MB each) consume too much memory during manifest generation.

What this means in plain language: the component that renders manifests ran out of memory while preparing deploy output.

Risk level: Safe configuration change.

Fix (immediate): Increase memory and reduce parallelism:

helm upgrade argocd argo/argo-cd --namespace argocd \
  --set 'repoServer.resources.limits.memory=1Gi' \
  --set 'repoServer.extraArgs={--parallelismlimit=2}' \
  --reuse-values

Fix (long-term): Vendor charts into Git. Vendored charts are already on disk, eliminating the download/extract memory spike.

3. Application-Controller OOMKilled

Problem: Application-controller shows OOMKilled. Syncs stop across all applications.

Cause: The controller holds in-memory state for all managed resources and exceeds the memory limit on resource-constrained nodes.

What this means in plain language: ArgoCD's central controller is trying to track more live state than its memory budget allows.

Risk level: Safe configuration change.

Fix: Set GOMEMLIMIT and reduce status/operation processors:

helm upgrade argocd argo/argo-cd --namespace argocd \
  --set 'controller.resources.limits.memory=512Mi' \
  --set 'controller.env[0].name=GOMEMLIMIT' \
  --set 'controller.env[0].value=400MiB' \
  --set 'configs.params.controller\.status\.processors=5' \
  --set 'configs.params.controller\.operation\.processors=3' \
  --reuse-values

4. "error reading from server: EOF"

Problem: Syncs fail with error reading from server: EOF in controller logs.

Cause: Stale gRPC connection between the controller and repo-server after a pod reschedule.

What this means in plain language: internal ArgoCD components lost their connection to each other and did not recover cleanly.

Risk level: Disruptive but routine.

Fix: Restart all ArgoCD pods, then hard-refresh affected applications:

kubectl delete pods -n argocd --all
kubectl patch application <name> -n argocd --type merge \
  -p '{"metadata":{"annotations":{"argocd.argoproj.io/refresh":"hard"}}}'

5. StatefulSet Adoption Failure ("spec: Forbidden")

Problem: PostgreSQL or Redis sync fails with updates to statefulset spec for fields other than 'replicas'... are forbidden.

Cause: Immutable StatefulSet fields (like volumeClaimTemplates) differ between the old and new release. Kubernetes does not allow in-place updates.

What this means in plain language: Kubernetes refuses to edit one of the parts of the StatefulSet that must stay immutable after creation.

Risk level: ServerSideApply is the safer option. Deleting the StatefulSet is more disruptive and should be done carefully.

Fix (option 1): Enable ServerSideApply on the Application:

syncPolicy:
  syncOptions:
    - ServerSideApply=true

Fix (option 2): Delete the StatefulSet while preserving data:

kubectl delete statefulset <name> -n <namespace> --cascade=orphan
kubectl delete pod <name>-0 -n <namespace>

The --cascade=orphan flag preserves pods and PVCs. ArgoCD recreates the StatefulSet on next sync.

6. Orphaned CRD Resources Block Sync

Problem: Application stuck at OutOfSync/Missing with "waiting for deletion of X and N more resources".

Cause: A CRD operator was removed before its custom resources were cleaned up. Orphaned finalizers cannot be processed.

What this means in plain language: Kubernetes is waiting for cleanup code that no longer exists, so the old custom resources never finish deleting.

Risk level: Destructive for the affected custom resources.

Fix: For each orphaned CRD kind, remove finalizers and delete:

kubectl patch <kind> <name> -n <ns> --type json \
  -p '[{"op":"remove","path":"/metadata/finalizers"}]'
kubectl delete <kind> <name> -n <ns> --wait=false

Prevention: Always delete custom resources before removing their operator.

7. Envoy Gateway CRD Annotation Too Long

Problem: Envoy Gateway sync fails with an annotation-related error.

Cause: Gateway API CRDs are large enough that the last-applied-configuration annotation exceeds the 256KB metadata limit.

What this means in plain language: the generated metadata is too large for Kubernetes to accept when the legacy apply annotation is included.

Risk level: Safe config change.

Fix: Use ServerSideApply, which skips the annotation entirely, and set ignoreDifferences on the Application:

syncPolicy:
  syncOptions:
    - ServerSideApply=true
    - RespectIgnoreDifferences=true

8. "cannot re-use a name that is still in use" (Helm)

Problem: Sync fails with cannot re-use a name that is still in use.

Cause: A previous Helm install failed partway through, leaving an orphaned release secret.

What this means in plain language: Helm thinks an older failed install still owns the release name.

Risk level: Destructive for the stale Helm release metadata, but not for the workload itself if you target the correct secret.

Fix: Delete the orphaned Helm release secrets, then hard-refresh:

kubectl delete secret -n <namespace> -l "owner=helm,name=<release-name>"

🔗 Terms On This Page

If a term below is unfamiliar, open its glossary entry. For the full list, go to Internal Glossary.

ArgoCD: The GitOps deployment system that keeps the cluster aligned with what is committed in Git.
GitOps: An operations model where Git is the source of truth for live cluster state.
Application Resource: The Kubernetes object ArgoCD uses to describe how one deployable component should be synced.
Helm Chart: A packaged set of Kubernetes templates and values used to deploy an application.