Skip to main content

Metacontroller

Metacontroller is the Kubernetes operator framework that manages the UserSwarm custom resource — the per-user Agent Runtime lifecycle. It watches UserSwarm CRs and calls a webhook in the backend namespace to reconcile the desired state (ServiceAccount, Service, Deployment) for each workspace.


Connection Details

PropertyValue
Metacontroller namespaceuserswarm-controller
Webhook deploymentuserswarm-webhook in backend namespace
Webhook port8080 (HTTP)
Webhook endpointhttp://userswarm-webhook.backend.svc.cluster.local:8080/sync
Health checkGET /healthz
Helm chart versionv4.13.0
ArgoCD sync wave3 (after namespaces and cert-manager, before orchestrator)
Resync period30 seconds

How It Works

Metacontroller itself is a generic Kubernetes controller. It does not contain any Crawbl logic — it just calls our webhook whenever a UserSwarm CR changes.

The reconcile loop

  1. A UserSwarm CR is created or updated (e.g., user signs up, or a workspace config changes)
  2. Metacontroller detects the change and sends a POST /sync to the webhook
  3. The webhook reads the CR spec and returns the desired child resources (ServiceAccount, Service, Deployment)
  4. Metacontroller creates, updates, or deletes children to match the desired state
  5. The webhook is called again every 30 seconds to keep status up to date

The finalize loop (deletion)

  1. A UserSwarm CR is deleted (e.g., user deletes their account)
  2. Metacontroller sends POST /sync with finalizing: true
  3. The webhook returns an empty children list
  4. Metacontroller deletes all child resources
  5. Once all children are gone, the webhook returns finalized: true
  6. Metacontroller removes the finalizer and the CR is deleted

UserSwarm Custom Resource

Group: crawbl.ai | Version: v1alpha1 | Kind: UserSwarm | Short name: uswarm

Scope: Cluster (not namespaced — child resources are created in userswarms namespace)

Key spec fields

FieldDescription
userIdCrawbl user identifier
placement.runtimeNamespaceTarget namespace (defaults to userswarms)
runtime.imageAgent runtime container image
runtime.portgRPC port (default 42618)
runtime.resourcesCPU/memory requests and limits
config.defaultProviderLLM provider slug
config.defaultModelModel identifier
config.envSecretRefKubernetes Secret reference for sensitive env vars
suspendBoolean — scale deployment to zero

Status phases

PhaseMeaning
ProgressingCreating child resources
ReadyDeployment has ready replicas, Ready condition is true
SuspendedScaled to zero (suspend: true)
ErrorBootstrap or config rendering failed
DeletingFinalization in progress

Naming convention

ResourcePatternExample
UserSwarm CRworkspace-{workspaceID}workspace-abc123def
ServiceAccountagent-runtime-workspace-{workspaceID}agent-runtime-workspace-abc123def
Service (ClusterIP)agent-runtime-workspace-{workspaceID}agent-runtime-workspace-abc123def
Deploymentagent-runtime-workspace-{workspaceID}agent-runtime-workspace-abc123def

Child Resources Per Workspace

For each UserSwarm CR, the webhook creates three child resources:

ResourceUpdate StrategyPurpose
ServiceAccountInPlacePod identity
Service (ClusterIP)InPlaceInternal gRPC endpoint
DeploymentInPlaceSingle-replica agent runtime (Recreate strategy)
tip

No PVC — the runtime is stateless. All persistent state lives in shared databases (PostgreSQL, Redis, object storage). Pods can be killed and replaced without data loss.


Webhook Process

The webhook runs the same crawbl-platform binary as the orchestrator but with a different entrypoint:

/crawbl platform userswarm webhook

It is a stateless HTTP server that reads config from environment variables once at startup. Updating the webhook deployment updates the config for all future reconcile cycles — no per-workspace edits needed.

Key environment variables

VariablePurpose
CRAWBL_AGENT_RUNTIME_IMAGEImage tag for all workspace pods
CRAWBL_ORCHESTRATOR_ENDPOINTOrchestrator internal address
CRAWBL_MCP_ENDPOINTMCP server URL injected into runtime pods
CRAWBL_MCP_SIGNING_KEYHMAC signing key (from Secret)
CRAWBL_DATABASE_*PostgreSQL connection (passed through to pods)
CRAWBL_REDIS_ADDRRedis address (passed through to pods)
CRAWBL_SPACES_*DigitalOcean Spaces config (passed through to pods)

Runtime Pod Security

Every agent runtime pod runs with a locked-down security context:

  • User/Group: 65532 (unprivileged nonroot)
  • Read-only root filesystem
  • No privilege escalation
  • All capabilities dropped
  • Seccomp: RuntimeDefault
  • Volumes: emptyDir for cache (512Mi) + tmpfs for temp (128Mi) — no persistent storage

Debugging

List all UserSwarm CRs

kubectl get userswarms -o wide
# or the short name:
kubectl get uswarm -o wide

Check a specific workspace

kubectl get uswarm workspace-<WORKSPACE_ID> -o yaml

View webhook logs

kubectl logs -n backend deploy/userswarm-webhook --tail=100

Check Metacontroller logs

kubectl logs -n userswarm-controller deploy/metacontroller --tail=100

Force a resync

Delete and recreate the CR, or edit any spec field. Metacontroller will call the webhook again within 30 seconds regardless.

Common issues

SymptomLikely CauseFix
CR stuck in ProgressingWebhook cannot reach Postgres or the image pull failsCheck webhook logs and pod events
CR in ErrorInvalid spec (missing image, bad config)Check the CR's status conditions
Child resources not createdWebhook is down or unreachableCheck kubectl get pods -n backend -l app=userswarm-webhook
Deletion stuckChildren not fully cleaned upCheck for finalizer on the CR and orphaned resources in userswarms namespace
Pod CrashLoopBackOffRuntime can't start (bad env, missing secret)Check pod logs: kubectl logs -n userswarms <pod-name>

Source Files

FilePurpose
crawbl-argocd-apps/root/metacontroller.yamlArgoCD Application CR
crawbl-argocd-apps/components/metacontroller/resources/CRD, CompositeController, webhook deployment
crawbl-backend/internal/userswarm/webhook/Webhook handler (sync, finalize, resource builders)
crawbl-backend/internal/userswarm/client/Go client for creating/updating UserSwarm CRs
crawbl-backend/cmd/crawbl/platform/userswarm/CLI entrypoint for webhook process

What's next: See User Swarm Isolation for the security model, or the Agent Runtime Tools page for what runs inside each pod.