User Swarm Isolation
These pages often point at shared systems. Confirm the cluster, namespace, and ownership boundary before running mutating commands.
Each user gets a separate runtime.
In plain language, those runtimes still live together inside shared cluster infrastructure. Isolation comes from private routing, ownership checks, and internal-only access, not from a separate namespace for every user.
This page answers one practical question: "how is one user's runtime kept separate from another user's runtime?"
UserSwarm is the cluster record we use to track one user's runtime.
The actual StatefulSets, Services, and PVCs are created in the shared userswarms namespace. Swarm pods are never directly reachable from the internet.
Why Swarm Runtimes Are Not Publicly Reachable
ZeroClaw pods have no public HTTPRoute. They are not exposed through the Envoy Gateway or any Ingress resource. There is no direct public path from the internet to a swarm pod.
If those terms are unfamiliar, the important point is simpler than the Kubernetes vocabulary: swarm runtimes are private internal services, not public apps.
Shared Namespace, Private Services
Swarm pods are reached through per-swarm internal Services in the shared userswarms namespace.
Those services are ClusterIP services, which means they are only reachable from inside the cluster.
The current webhook-managed runtime does not create a per-swarm NetworkPolicy.
That means the current model is "private by topology and backend mediation," not "hard isolated by one namespace and one network policy per user."
Isolation today comes from:
- No public
HTTPRouteor ingress to the runtime services - Per-workspace
UserSwarmownership and deterministic service naming - The orchestrator resolving the target service from the authenticated workspace
- Cluster-internal DNS and service discovery instead of public exposure
Orchestrator Proxy Model
All swarm traffic is proxied through the orchestrator:
The orchestrator is the component that holds the mapping between users, workspaces, and swarm endpoints.
That ensures:
- Authentication and authorization happen before any swarm access
- The backend can enforce rate limits, billing controls, and audit logging
- Cross-user agent traffic is mediated by the backend rather than by public runtime addresses
The app sends a message
The mobile app calls /v1/workspaces/{id}/conversations/{id}/messages.
The orchestrator checks identity
It authenticates the user and resolves the workspace to the correct swarm service.
The request stays private
The orchestrator proxies traffic to the swarm's ClusterIP service inside the cluster.
The result comes back
The swarm processes the request, the orchestrator receives the response, and the app gets the final result.
Lifecycle Cleanup
When a workspace is deleted, cleanup should remove the matching runtime and its supporting resources.
There are multiple layers because cluster cleanup can fail halfway through, and we do not want orphaned runtimes left behind.
The current 3-layer defense is:
| Layer | Mechanism | Description |
|---|---|---|
| 1 | Orchestrator | Deletes the UserSwarm CR when a workspace is deleted |
| 2 | Metacontroller + webhook finalize hook | Deletes the StatefulSet, Services, PVC, ConfigMap, and ServiceAccount |
| 3 | Reaper | Periodic background job that finds orphaned cluster-scoped UserSwarm CRs and deletes them |
Deletion Flow
Delete the workspace
The user deletes the workspace through the API.
Remove the UserSwarm record
The orchestrator deletes the cluster-scoped UserSwarm custom resource.
Run final cleanup
Metacontroller calls the finalize hook and tears down the runtime children in userswarms.
Catch leftovers
If a CR is left behind without a live owner, the reaper removes it on the next sweep.
The backend should wait for swarm verified=true, not just pod readiness, before routing user traffic.
🔗 Terms On This Page
If a term below is unfamiliar, open its glossary entry. For the full list, go to Internal Glossary.
- UserSwarm: The Crawbl custom resource that represents one user runtime and its lifecycle.
- HTTPRoute: The routing rule that tells the gateway which hostname and path should reach which service.
- ClusterIP Service: A Kubernetes service that is reachable only from inside the cluster.
- StatefulSet: The Kubernetes workload type used when pods need stable identities and persistent storage.
- PVC: A PersistentVolumeClaim, which requests persistent storage for a workload.
- Metacontroller: A controller framework used to create and clean up user runtime resources from custom resources.