Metacontroller
Metacontroller is the Kubernetes operator framework that manages the UserSwarm custom resource — the per-user Agent Runtime lifecycle. It watches UserSwarm CRs and calls a webhook in the backend namespace to reconcile the desired state (ServiceAccount, Service, Deployment) for each workspace.
Connection Details
| Property | Value |
|---|---|
| Metacontroller namespace | userswarm-controller |
| Webhook deployment | userswarm-webhook in backend namespace |
| Webhook port | 8080 (HTTP) |
| Webhook endpoint | http://userswarm-webhook.backend.svc.cluster.local:8080/sync |
| Health check | GET /healthz |
| Helm chart version | v4.13.0 |
| ArgoCD sync wave | 3 (after namespaces and cert-manager, before orchestrator) |
| Resync period | 30 seconds |
How It Works
Metacontroller itself is a generic Kubernetes controller. It does not contain any Crawbl logic — it just calls our webhook whenever a UserSwarm CR changes.
The reconcile loop
- A UserSwarm CR is created or updated (e.g., user signs up, or a workspace config changes)
- Metacontroller detects the change and sends a
POST /syncto the webhook - The webhook reads the CR spec and returns the desired child resources (ServiceAccount, Service, Deployment)
- Metacontroller creates, updates, or deletes children to match the desired state
- The webhook is called again every 30 seconds to keep status up to date
The finalize loop (deletion)
- A UserSwarm CR is deleted (e.g., user deletes their account)
- Metacontroller sends
POST /syncwithfinalizing: true - The webhook returns an empty children list
- Metacontroller deletes all child resources
- Once all children are gone, the webhook returns
finalized: true - Metacontroller removes the finalizer and the CR is deleted
UserSwarm Custom Resource
Group: crawbl.ai | Version: v1alpha1 | Kind: UserSwarm | Short name: uswarm
Scope: Cluster (not namespaced — child resources are created in userswarms namespace)
Key spec fields
| Field | Description |
|---|---|
userId | Crawbl user identifier |
placement.runtimeNamespace | Target namespace (defaults to userswarms) |
runtime.image | Agent runtime container image |
runtime.port | gRPC port (default 42618) |
runtime.resources | CPU/memory requests and limits |
config.defaultProvider | LLM provider slug |
config.defaultModel | Model identifier |
config.envSecretRef | Kubernetes Secret reference for sensitive env vars |
suspend | Boolean — scale deployment to zero |
Status phases
| Phase | Meaning |
|---|---|
| Progressing | Creating child resources |
| Ready | Deployment has ready replicas, Ready condition is true |
| Suspended | Scaled to zero (suspend: true) |
| Error | Bootstrap or config rendering failed |
| Deleting | Finalization in progress |
Naming convention
| Resource | Pattern | Example |
|---|---|---|
| UserSwarm CR | workspace-{workspaceID} | workspace-abc123def |
| ServiceAccount | agent-runtime-workspace-{workspaceID} | agent-runtime-workspace-abc123def |
| Service (ClusterIP) | agent-runtime-workspace-{workspaceID} | agent-runtime-workspace-abc123def |
| Deployment | agent-runtime-workspace-{workspaceID} | agent-runtime-workspace-abc123def |
Child Resources Per Workspace
For each UserSwarm CR, the webhook creates three child resources:
| Resource | Update Strategy | Purpose |
|---|---|---|
| ServiceAccount | InPlace | Pod identity |
| Service (ClusterIP) | InPlace | Internal gRPC endpoint |
| Deployment | InPlace | Single-replica agent runtime (Recreate strategy) |
No PVC — the runtime is stateless. All persistent state lives in shared databases (PostgreSQL, Redis, object storage). Pods can be killed and replaced without data loss.
Webhook Process
The webhook runs the same crawbl-platform binary as the orchestrator but with a different entrypoint:
/crawbl platform userswarm webhook
It is a stateless HTTP server that reads config from environment variables once at startup. Updating the webhook deployment updates the config for all future reconcile cycles — no per-workspace edits needed.
Key environment variables
| Variable | Purpose |
|---|---|
CRAWBL_AGENT_RUNTIME_IMAGE | Image tag for all workspace pods |
CRAWBL_ORCHESTRATOR_ENDPOINT | Orchestrator internal address |
CRAWBL_MCP_ENDPOINT | MCP server URL injected into runtime pods |
CRAWBL_MCP_SIGNING_KEY | HMAC signing key (from Secret) |
CRAWBL_DATABASE_* | PostgreSQL connection (passed through to pods) |
CRAWBL_REDIS_ADDR | Redis address (passed through to pods) |
CRAWBL_SPACES_* | DigitalOcean Spaces config (passed through to pods) |
Runtime Pod Security
Every agent runtime pod runs with a locked-down security context:
- User/Group: 65532 (unprivileged
nonroot) - Read-only root filesystem
- No privilege escalation
- All capabilities dropped
- Seccomp: RuntimeDefault
- Volumes:
emptyDirfor cache (512Mi) +tmpfsfor temp (128Mi) — no persistent storage
Debugging
List all UserSwarm CRs
kubectl get userswarms -o wide
# or the short name:
kubectl get uswarm -o wide
Check a specific workspace
kubectl get uswarm workspace-<WORKSPACE_ID> -o yaml
View webhook logs
kubectl logs -n backend deploy/userswarm-webhook --tail=100
Check Metacontroller logs
kubectl logs -n userswarm-controller deploy/metacontroller --tail=100
Force a resync
Delete and recreate the CR, or edit any spec field. Metacontroller will call the webhook again within 30 seconds regardless.
Common issues
| Symptom | Likely Cause | Fix |
|---|---|---|
CR stuck in Progressing | Webhook cannot reach Postgres or the image pull fails | Check webhook logs and pod events |
CR in Error | Invalid spec (missing image, bad config) | Check the CR's status conditions |
| Child resources not created | Webhook is down or unreachable | Check kubectl get pods -n backend -l app=userswarm-webhook |
| Deletion stuck | Children not fully cleaned up | Check for finalizer on the CR and orphaned resources in userswarms namespace |
Pod CrashLoopBackOff | Runtime can't start (bad env, missing secret) | Check pod logs: kubectl logs -n userswarms <pod-name> |
Source Files
| File | Purpose |
|---|---|
crawbl-argocd-apps/root/metacontroller.yaml | ArgoCD Application CR |
crawbl-argocd-apps/components/metacontroller/resources/ | CRD, CompositeController, webhook deployment |
crawbl-backend/internal/userswarm/webhook/ | Webhook handler (sync, finalize, resource builders) |
crawbl-backend/internal/userswarm/client/ | Go client for creating/updating UserSwarm CRs |
crawbl-backend/cmd/crawbl/platform/userswarm/ | CLI entrypoint for webhook process |
What's next: See User Swarm Isolation for the security model, or the Agent Runtime Tools page for what runs inside each pod.