Codebase Layout
The Crawbl backend lives in a single Go repository (crawbl-backend).
In plain language, this page is the map you use when you know what kind of change you need to make, but you do not yet know where in the repo that change belongs.
Understanding its directory structure is the fastest way to orient yourself when you need to find or change something.
Repository Tree
crawbl-backend/
├── cmd/
│ ├── crawbl/ # Unified CLI + runtime entrypoints
│ │ ├── app/ # `crawbl app build/deploy`
│ │ ├── dev/ # local dev commands (`start`, `lint`, `verify`, ...)
│ │ ├── infra/ # `crawbl infra plan/update/destroy/bootstrap`
│ │ ├── platform/ # platform subcommands
│ │ │ ├── orchestrator/ # orchestrator startup
│ │ │ └── userswarm/ # userswarm webhook
│ │ ├── setup/ # developer machine bootstrap
│ │ └── test/ # `crawbl test unit|e2e`
│ ├── crawbl-agent-runtime/ # Agent Runtime binary (deployed per-workspace)
│ └── envoy-auth-filter/ # Envoy ext_authz WASM filter binary
│
├── proto/agentruntime/v1/ # gRPC proto definitions (runtime.proto, memory.proto)
│
├── internal/ # All business logic (unexported)
│ ├── orchestrator/ # API domain core
│ │ ├── repo/ # Repository layer (persistence)
│ │ │ └── usagerepo/ # Usage counters + quota queries
│ │ ├── memory/ # MemPalace subsystem
│ │ │ ├── types.go # Drawer, Entity, Triple, Identity,
│ │ │ │ # HybridSearchResult, PipelineTier
│ │ │ │ # constants, HeuristicKillSwitchValue
│ │ │ ├── autoingest/ # In-process pond pool (NOT River)
│ │ │ │ ├── types.go # Service, Work, Deps, Config, Metrics
│ │ │ │ ├── service.go # NewService — wires pond.TypedPool
│ │ │ │ ├── worker.go # per-chunk pipeline (classify→embed→persist)
│ │ │ │ └── helpers.go # isNoise, chunkText, buildDrawer, …
│ │ │ ├── extract/ # Heuristic + LLM classifiers
│ │ │ ├── config/ # Embedded JSON tuning
│ │ │ ├── jobs/ # Cold-classification business logic
│ │ │ │ ├── process.go # RunProcess(ctx, deps)
│ │ │ │ ├── maintain.go # RunMaintain(ctx, deps)
│ │ │ │ ├── enrich.go # RunEnrich(ctx, deps)
│ │ │ │ └── centroids.go # RunCentroidRecompute(ctx, deps)
│ │ │ ├── layers/ # 4-layer retrieval stack (L0–L3)
│ │ │ └── repo/ # All persistence
│ │ │ ├── types.go # DrawerRepo / KGRepo / PalaceGraphRepo
│ │ │ │ # / IdentityRepo / CentroidRepo interfaces
│ │ │ ├── drawerrepo/ # pgvector + hybrid CTE search
│ │ │ ├── centroidrepo/ # memory_type_centroids (Phase 2)
│ │ │ ├── kgrepo/ # Knowledge graph store
│ │ │ ├── palacegraphrepo/ # BFS + Redis-cached aggregation
│ │ │ └── identityrepo/ # memory_identities upsert/read
│ │ ├── queue/ # All River background jobs (5 files)
│ │ │ ├── types.go # Args types, Deps, queue constants,
│ │ │ │ # InsertOpts, Worker declarations,
│ │ │ │ # tag/metadata vars, event payloads
│ │ │ ├── config.go # NewConfig(Deps) — single entry point
│ │ │ │ # for all 7 workers + periodic jobs
│ │ │ ├── memory_workers.go # 4 memory River adapters
│ │ │ │ # (process, maintain, enrich, centroid)
│ │ │ ├── orchestrator_workers.go# 3 cross-cutting River adapters
│ │ │ │ # (usage_write, pricing_refresh,
│ │ │ │ # message_cleanup)
│ │ │ └── publishers.go # MemoryPublisher (NATS) + UsagePublisher
│ │ │ # (River insert) + event stamper
│ │ ├── service/ # Service layer (business logic)
│ │ │ ├── authservice/ # Auth + user provisioning
│ │ │ ├── chatservice/ # Message sending + gRPC streaming
│ │ │ ├── mcpservice/ # MCP tool handlers (artifacts, workflows)
│ │ │ └── workflowservice/ # Workflow execution engine
│ │ ├── server/ # HTTP handlers + Socket.IO
│ │ │ ├── handler/ # HTTP handler functions
│ │ │ ├── dto/ # Request/response types
│ │ │ ├── socketio/ # Socket.IO server + broadcaster
│ │ │ └── mcp/ # Embedded MCP server endpoint
│ │ └── types.go # Domain types, interfaces, constants
│ ├── userswarm/ # Runtime client, webhook, reaper
│ │ └── client/ # gRPC client to agent runtime pods
│ ├── agentruntime/ # Agent Runtime (deployed in workspace pods)
│ │ ├── server/ # gRPC Converse + Memory handlers
│ │ ├── runner/ # ADK-Go agent runner
│ │ ├── session/ # Redis-backed session state
│ │ ├── storage/ # DO Spaces file storage
│ │ ├── memory/ # Postgres-backed memory store
│ │ └── proto/v1/ # Generated gRPC bindings
│ ├── infra/ # Pulumi IaC (cluster + ArgoCD only)
│ │ ├── cluster/ # DOKS, VPC, DOCR
│ │ └── platform/ # ArgoCD bootstrap
│ ├── testsuite/ # Shared Godog e2e helpers
│ └── pkg/ # Shared internal packages
│ ├── configenv/ # Secret/env loading
│ ├── crawblnats/ # NATS JetStream client
│ ├── database/ # Postgres connection + transactions
│ ├── errors/ # Structured error types
│ ├── firebase/ # FCM push client
│ ├── grpc/ # gRPC HMAC auth interceptors
│ ├── hmac/ # Reusable HMAC helpers
│ ├── httpserver/ # Auth middleware + response writers
│ ├── kube/ # Kubernetes helpers
│ ├── pricing/ # In-memory model pricing cache
│ ├── realtime/ # Socket.IO event types + broadcaster
│ ├── redisclient/ # Redis client
│ ├── telemetry/ # Turn metrics + observability
│ └── yamlvalues/ # YAML stack config loading
│
├── api/v1alpha1/ # Kubernetes CRD types for UserSwarm
├── migrations/
│ ├── orchestrator/ # Postgres SQL migrations
│ │ │ # 000009: pipeline_tier, entity_count,
│ │ │ # triple_count columns + enrich index
│ │ │ # 000010: memory_type_centroids table
│ │ │ # 000011: centroid pgvector index
│ │ └── seed/ # JSON seed data (tools, models, plans)
│ └── clickhouse/ # ClickHouse DDL for analytics
├── dockerfiles/ # Dockerfiles for all images
├── test-features/ # Cucumber/Gherkin feature files
└── .github/workflows/ # CI workflows
Key Directories Explained
cmd/ — Entry Points
This is where Go binaries start. The crawbl binary is both the developer CLI and the deployed platform binary. Additional standalone binaries:
crawbl-agent-runtime— The per-workspace agent pod binary, built on ADK-Go. Talks gRPC on port 42618.envoy-auth-filter— WASM binary for the Envoy Gateway edge filter.
internal/orchestrator/ — The Heart of the Backend
This is where you will spend most of your time. It contains the API's domain core, organized into three sub-layers (server, service, repo) that follow a strict top-down dependency rule. The types.go file at the root is the central contract shared by the rest of the package.
Key sub-packages:
server/handler/— HTTP handler functionsserver/dto/— Request/response types with JSON tagsserver/socketio/— Socket.IO server, broadcaster, and message handlersserver/mcp/— Embedded MCP server endpoint for agent runtime podsservice/chatservice/— Message sending and gRPC stream processing (the hot path)queue/— All seven River workers and thequeue.NewConfig(Deps)entry point. Five files:types.go(all static symbols),config.go(singleNewConfig),memory_workers.go(4 memory cold-path adapters),orchestrator_workers.go(3 cross-cutting adapters: usage_write, pricing_refresh, message_cleanup),publishers.go(NATS + River event publishers). Auto-ingest is not a River worker — it lives ininternal/orchestrator/memory/autoingest/and runs as an in-processalitto/pondpool.repo/usagerepo/— Usage counter and quota queries
internal/userswarm/ and internal/agentruntime/
These packages manage the runtime side. userswarm contains the Kubernetes-backed runtime client (gRPC), the Metacontroller sync/finalize webhook, and the cleanup reaper job. agentruntime is the full agent runtime binary with:
server/— gRPC Converse and Memory service handlersrunner/— ADK-Go agent runner (drives LLM calls and tool execution)session/— Redis-backed conversation session statestorage/— DO Spaces file storage for agent workspacesmemory/— Postgres-backed durable memory store
internal/infra/
Pulumi infrastructure code for bootstrapping the Kubernetes cluster and installing ArgoCD. This is intentionally minimal. The application workloads themselves are deployed from the separate crawbl-argocd-apps repository.
internal/pkg/ — Shared Utilities
Reusable packages used across the codebase. Notable ones include database (Postgres connections and transactions), errors (structured business vs. server errors), httpserver (middleware and auth), realtime (Socket.IO event types and broadcasting), pricing (in-memory model pricing cache), crawblnats (NATS JetStream client), and grpc (HMAC auth interceptors for runtime communication).
api/v1alpha1/
Kubernetes Custom Resource Definition (CRD) types for the cluster-scoped UserSwarm resource. These Go types define the schema that the runtime client and webhook work with.
migrations/orchestrator/
SQL migration files for the PostgreSQL database. Migrations run automatically on startup, so adding a new table or column means adding a new migration file here.
internal/testsuite/e2e/ and test-features/
End-to-end coverage uses Godog/Cucumber. The Go test harness and step definitions live under internal/testsuite/e2e/, while the Gherkin feature files live under test-features/.
What's Next
Now that you know where code lives, see the Layered Design to understand the rules governing how these packages interact.