Skip to main content

User Swarm Isolation

Before You Change Anything

These pages often point at shared systems. Confirm the cluster, namespace, and ownership boundary before running mutating commands.

Each user gets a separate runtime.

In plain language, those runtimes still live together inside shared cluster infrastructure. Isolation comes from private routing, ownership checks, and internal-only access, not from a separate namespace for every user.

This page answers one practical question: "how is one user's runtime kept separate from another user's runtime?"

UserSwarm is the cluster record we use to track one user's runtime.

The actual StatefulSets, Services, and PVCs are created in the shared userswarms namespace. Swarm pods are never directly reachable from the internet.

Why Swarm Runtimes Are Not Publicly Reachable

ZeroClaw pods have no public HTTPRoute. They are not exposed through the Envoy Gateway or any Ingress resource. There is no direct public path from the internet to a swarm pod.

If those terms are unfamiliar, the important point is simpler than the Kubernetes vocabulary: swarm runtimes are private internal services, not public apps.

Shared Namespace, Private Services

Swarm pods are reached through per-swarm internal Services in the shared userswarms namespace.

Those services are ClusterIP services, which means they are only reachable from inside the cluster.

The current webhook-managed runtime does not create a per-swarm NetworkPolicy.

That means the current model is "private by topology and backend mediation," not "hard isolated by one namespace and one network policy per user."

Isolation today comes from:

  • No public HTTPRoute or ingress to the runtime services
  • Per-workspace UserSwarm ownership and deterministic service naming
  • The orchestrator resolving the target service from the authenticated workspace
  • Cluster-internal DNS and service discovery instead of public exposure

Orchestrator Proxy Model

All swarm traffic is proxied through the orchestrator:

The orchestrator is the component that holds the mapping between users, workspaces, and swarm endpoints.

That ensures:

  • Authentication and authorization happen before any swarm access
  • The backend can enforce rate limits, billing controls, and audit logging
  • Cross-user agent traffic is mediated by the backend rather than by public runtime addresses
1
Step 1

The app sends a message

The mobile app calls /v1/workspaces/{id}/conversations/{id}/messages.

2
Step 2

The orchestrator checks identity

It authenticates the user and resolves the workspace to the correct swarm service.

3
Step 3

The request stays private

The orchestrator proxies traffic to the swarm's ClusterIP service inside the cluster.

4
Step 4

The result comes back

The swarm processes the request, the orchestrator receives the response, and the app gets the final result.

Lifecycle Cleanup

When a workspace is deleted, cleanup should remove the matching runtime and its supporting resources.

There are multiple layers because cluster cleanup can fail halfway through, and we do not want orphaned runtimes left behind.

The current 3-layer defense is:

LayerMechanismDescription
1OrchestratorDeletes the UserSwarm CR when a workspace is deleted
2Metacontroller + webhook finalize hookDeletes the StatefulSet, Services, PVC, ConfigMap, and ServiceAccount
3ReaperPeriodic background job that finds orphaned cluster-scoped UserSwarm CRs and deletes them

Deletion Flow

1
Step 1

Delete the workspace

The user deletes the workspace through the API.

2
Step 2

Remove the UserSwarm record

The orchestrator deletes the cluster-scoped UserSwarm custom resource.

3
Step 3

Run final cleanup

Metacontroller calls the finalize hook and tears down the runtime children in userswarms.

4
Step 4

Catch leftovers

If a CR is left behind without a live owner, the reaper removes it on the next sweep.

The backend should wait for swarm verified=true, not just pod readiness, before routing user traffic.

🔗 Terms On This Page

If a term below is unfamiliar, open its glossary entry. For the full list, go to Internal Glossary.

  • UserSwarm: The Crawbl custom resource that represents one user runtime and its lifecycle.
  • HTTPRoute: The routing rule that tells the gateway which hostname and path should reach which service.
  • ClusterIP Service: A Kubernetes service that is reachable only from inside the cluster.
  • StatefulSet: The Kubernetes workload type used when pods need stable identities and persistent storage.
  • PVC: A PersistentVolumeClaim, which requests persistent storage for a workload.
  • Metacontroller: A controller framework used to create and clean up user runtime resources from custom resources.