Перейти к основному содержимому

Codebase Layout

The Crawbl backend lives in a single Go repository (crawbl-backend).

In plain language, this page is the map you use when you know what kind of change you need to make, but you do not yet know where in the repo that change belongs.

Understanding its directory structure is the fastest way to orient yourself when you need to find or change something.

Repository Tree

crawbl-backend/
├── cmd/
│ ├── crawbl/ # Unified CLI + runtime entrypoints
│ │ ├── app/ # `crawbl app build/deploy`
│ │ ├── dev/ # local dev commands (`start`, `lint`, `verify`, ...)
│ │ ├── infra/ # `crawbl infra plan/update/destroy/bootstrap`
│ │ ├── platform/ # platform subcommands
│ │ │ ├── orchestrator/ # orchestrator startup
│ │ │ └── userswarm/ # userswarm webhook
│ │ ├── setup/ # developer machine bootstrap
│ │ └── test/ # `crawbl test unit|e2e`
│ ├── crawbl-agent-runtime/ # Agent Runtime binary (deployed per-workspace)
│ └── envoy-auth-filter/ # Envoy ext_authz WASM filter binary

├── proto/agentruntime/v1/ # gRPC proto definitions (runtime.proto, memory.proto)

├── internal/ # All business logic (unexported)
│ ├── orchestrator/ # API domain core
│ │ ├── repo/ # Repository layer (persistence)
│ │ │ └── usagerepo/ # Usage counters + quota queries
│ │ ├── memory/ # MemPalace subsystem
│ │ │ ├── types.go # Drawer, Entity, Triple, Identity,
│ │ │ │ # HybridSearchResult, PipelineTier
│ │ │ │ # constants, HeuristicKillSwitchValue
│ │ │ ├── autoingest/ # In-process pond pool (NOT River)
│ │ │ │ ├── types.go # Service, Work, Deps, Config, Metrics
│ │ │ │ ├── service.go # NewService — wires pond.TypedPool
│ │ │ │ ├── worker.go # per-chunk pipeline (classify→embed→persist)
│ │ │ │ └── helpers.go # isNoise, chunkText, buildDrawer, …
│ │ │ ├── extract/ # Heuristic + LLM classifiers
│ │ │ ├── config/ # Embedded JSON tuning
│ │ │ ├── jobs/ # Cold-classification business logic
│ │ │ │ ├── process.go # RunProcess(ctx, deps)
│ │ │ │ ├── maintain.go # RunMaintain(ctx, deps)
│ │ │ │ ├── enrich.go # RunEnrich(ctx, deps)
│ │ │ │ └── centroids.go # RunCentroidRecompute(ctx, deps)
│ │ │ ├── layers/ # 4-layer retrieval stack (L0–L3)
│ │ │ └── repo/ # All persistence
│ │ │ ├── types.go # DrawerRepo / KGRepo / PalaceGraphRepo
│ │ │ │ # / IdentityRepo / CentroidRepo interfaces
│ │ │ ├── drawerrepo/ # pgvector + hybrid CTE search
│ │ │ ├── centroidrepo/ # memory_type_centroids (Phase 2)
│ │ │ ├── kgrepo/ # Knowledge graph store
│ │ │ ├── palacegraphrepo/ # BFS + Redis-cached aggregation
│ │ │ └── identityrepo/ # memory_identities upsert/read
│ │ ├── queue/ # All River background jobs (5 files)
│ │ │ ├── types.go # Args types, Deps, queue constants,
│ │ │ │ # InsertOpts, Worker declarations,
│ │ │ │ # tag/metadata vars, event payloads
│ │ │ ├── config.go # NewConfig(Deps) — single entry point
│ │ │ │ # for all 7 workers + periodic jobs
│ │ │ ├── memory_workers.go # 4 memory River adapters
│ │ │ │ # (process, maintain, enrich, centroid)
│ │ │ ├── orchestrator_workers.go# 3 cross-cutting River adapters
│ │ │ │ # (usage_write, pricing_refresh,
│ │ │ │ # message_cleanup)
│ │ │ └── publishers.go # MemoryPublisher (NATS) + UsagePublisher
│ │ │ # (River insert) + event stamper
│ │ ├── service/ # Service layer (business logic)
│ │ │ ├── authservice/ # Auth + user provisioning
│ │ │ ├── chatservice/ # Message sending + gRPC streaming
│ │ │ ├── mcpservice/ # MCP tool handlers (artifacts, workflows)
│ │ │ └── workflowservice/ # Workflow execution engine
│ │ ├── server/ # HTTP handlers + Socket.IO
│ │ │ ├── handler/ # HTTP handler functions
│ │ │ ├── dto/ # Request/response types
│ │ │ ├── socketio/ # Socket.IO server + broadcaster
│ │ │ └── mcp/ # Embedded MCP server endpoint
│ │ └── types.go # Domain types, interfaces, constants
│ ├── userswarm/ # Runtime client, webhook, reaper
│ │ └── client/ # gRPC client to agent runtime pods
│ ├── agentruntime/ # Agent Runtime (deployed in workspace pods)
│ │ ├── server/ # gRPC Converse + Memory handlers
│ │ ├── runner/ # ADK-Go agent runner
│ │ ├── session/ # Redis-backed session state
│ │ ├── storage/ # DO Spaces file storage
│ │ ├── memory/ # Postgres-backed memory store
│ │ └── proto/v1/ # Generated gRPC bindings
│ ├── infra/ # Pulumi IaC (cluster + ArgoCD only)
│ │ ├── cluster/ # DOKS, VPC, DOCR
│ │ └── platform/ # ArgoCD bootstrap
│ ├── testsuite/ # Shared Godog e2e helpers
│ └── pkg/ # Shared internal packages
│ ├── configenv/ # Secret/env loading
│ ├── crawblnats/ # NATS JetStream client
│ ├── database/ # Postgres connection + transactions
│ ├── errors/ # Structured error types
│ ├── firebase/ # FCM push client
│ ├── grpc/ # gRPC HMAC auth interceptors
│ ├── hmac/ # Reusable HMAC helpers
│ ├── httpserver/ # Auth middleware + response writers
│ ├── kube/ # Kubernetes helpers
│ ├── pricing/ # In-memory model pricing cache
│ ├── realtime/ # Socket.IO event types + broadcaster
│ ├── redisclient/ # Redis client
│ ├── telemetry/ # Turn metrics + observability
│ └── yamlvalues/ # YAML stack config loading

├── api/v1alpha1/ # Kubernetes CRD types for UserSwarm
├── migrations/
│ ├── orchestrator/ # Postgres SQL migrations
│ │ │ # 000009: pipeline_tier, entity_count,
│ │ │ # triple_count columns + enrich index
│ │ │ # 000010: memory_type_centroids table
│ │ │ # 000011: centroid pgvector index
│ │ └── seed/ # JSON seed data (tools, models, plans)
│ └── clickhouse/ # ClickHouse DDL for analytics
├── dockerfiles/ # Dockerfiles for all images
├── test-features/ # Cucumber/Gherkin feature files
└── .github/workflows/ # CI workflows

Key Directories Explained

cmd/ — Entry Points

This is where Go binaries start. The crawbl binary is both the developer CLI and the deployed platform binary. Additional standalone binaries:

  • crawbl-agent-runtime — The per-workspace agent pod binary, built on ADK-Go. Talks gRPC on port 42618.
  • envoy-auth-filter — WASM binary for the Envoy Gateway edge filter.

internal/orchestrator/ — The Heart of the Backend

This is where you will spend most of your time. It contains the API's domain core, organized into three sub-layers (server, service, repo) that follow a strict top-down dependency rule. The types.go file at the root is the central contract shared by the rest of the package.

Key sub-packages:

  • server/handler/ — HTTP handler functions
  • server/dto/ — Request/response types with JSON tags
  • server/socketio/ — Socket.IO server, broadcaster, and message handlers
  • server/mcp/ — Embedded MCP server endpoint for agent runtime pods
  • service/chatservice/ — Message sending and gRPC stream processing (the hot path)
  • queue/ — All seven River workers and the queue.NewConfig(Deps) entry point. Five files: types.go (all static symbols), config.go (single NewConfig), memory_workers.go (4 memory cold-path adapters), orchestrator_workers.go (3 cross-cutting adapters: usage_write, pricing_refresh, message_cleanup), publishers.go (NATS + River event publishers). Auto-ingest is not a River worker — it lives in internal/orchestrator/memory/autoingest/ and runs as an in-process alitto/pond pool.
  • repo/usagerepo/ — Usage counter and quota queries

internal/userswarm/ and internal/agentruntime/

These packages manage the runtime side. userswarm contains the Kubernetes-backed runtime client (gRPC), the Metacontroller sync/finalize webhook, and the cleanup reaper job. agentruntime is the full agent runtime binary with:

  • server/ — gRPC Converse and Memory service handlers
  • runner/ — ADK-Go agent runner (drives LLM calls and tool execution)
  • session/ — Redis-backed conversation session state
  • storage/ — DO Spaces file storage for agent workspaces
  • memory/ — Postgres-backed durable memory store

internal/infra/

Pulumi infrastructure code for bootstrapping the Kubernetes cluster and installing ArgoCD. This is intentionally minimal. The application workloads themselves are deployed from the separate crawbl-argocd-apps repository.

internal/pkg/ — Shared Utilities

Reusable packages used across the codebase. Notable ones include database (Postgres connections and transactions), errors (structured business vs. server errors), httpserver (middleware and auth), realtime (Socket.IO event types and broadcasting), pricing (in-memory model pricing cache), crawblnats (NATS JetStream client), and grpc (HMAC auth interceptors for runtime communication).

api/v1alpha1/

Kubernetes Custom Resource Definition (CRD) types for the cluster-scoped UserSwarm resource. These Go types define the schema that the runtime client and webhook work with.

migrations/orchestrator/

SQL migration files for the PostgreSQL database. Migrations run automatically on startup, so adding a new table or column means adding a new migration file here.

internal/testsuite/e2e/ and test-features/

End-to-end coverage uses Godog/Cucumber. The Go test harness and step definitions live under internal/testsuite/e2e/, while the Gherkin feature files live under test-features/.

What's Next

Now that you know where code lives, see the Layered Design to understand the rules governing how these packages interact.