Table Definitions
Full column specifications for all tables in the orchestrator schema.
users
| Column | Type | Constraints | Description |
|---|---|---|---|
id | UUID | PK | User identifier |
subject | TEXT | NOT NULL, UNIQUE | Auth provider subject (Firebase UID or dev token) |
email | TEXT | NOT NULL | User email |
nickname | TEXT | NOT NULL, default '' | Display nickname |
name | TEXT | NOT NULL, default '' | First name |
surname | TEXT | NOT NULL, default '' | Last name |
avatar_url | TEXT | NULL | Profile avatar URL |
country_code | TEXT | NULL | ISO country code |
date_of_birth | TIMESTAMPTZ | NULL | Date of birth |
is_banned | BOOLEAN | NOT NULL, default false | Ban flag |
has_agreed_with_terms | BOOLEAN | NOT NULL, default false | Terms acceptance |
has_agreed_with_privacy_policy | BOOLEAN | NOT NULL, default false | Privacy acceptance |
created_at | TIMESTAMPTZ | NOT NULL | Creation timestamp |
updated_at | TIMESTAMPTZ | NOT NULL | Last update timestamp |
deleted_at | TIMESTAMPTZ | NULL | Soft delete timestamp |
Indexes: idx_users_subject on subject
user_preferences
| Column | Type | Constraints | Description |
|---|---|---|---|
user_id | UUID | PK, FK -> users(id) CASCADE | User reference |
platform_theme | TEXT | NULL | UI theme preference |
platform_language | TEXT | NULL | Language code |
currency_code | TEXT | NULL | Preferred currency |
updated_at | TIMESTAMPTZ | NOT NULL | Last update timestamp |
user_push_tokens
| Column | Type | Constraints | Description |
|---|---|---|---|
user_id | UUID | PK, FK -> users(id) CASCADE | User reference |
push_token | TEXT | NOT NULL | FCM token |
updated_at | TIMESTAMPTZ | NOT NULL | Last update timestamp |
workspaces
| Column | Type | Constraints | Description |
|---|---|---|---|
id | UUID | PK | Workspace identifier |
user_id | UUID | NOT NULL, FK -> users(id) CASCADE | Owner |
name | TEXT | NOT NULL | Workspace display name |
created_at | TIMESTAMPTZ | NOT NULL | Creation timestamp |
updated_at | TIMESTAMPTZ | NOT NULL | Last update timestamp |
Indexes: idx_workspaces_user_id on user_id
Runtime status comes from UserSwarm.status in Kubernetes, not this table.
agents
| Column | Type | Constraints | Description |
|---|---|---|---|
id | UUID | PK | Agent identifier |
workspace_id | UUID | NOT NULL, FK -> workspaces(id) CASCADE | Parent workspace |
name | TEXT | NOT NULL | Agent display name |
slug | TEXT | NOT NULL | URL-safe identifier (e.g., wally, manager) |
role | TEXT | NOT NULL | Agent role (e.g., default) |
description | TEXT | NOT NULL, default '' | Agent description |
avatar_url | TEXT | NOT NULL, default '' | Agent avatar |
sort_order | INTEGER | NOT NULL, default 0 | Display order |
created_at | TIMESTAMPTZ | NOT NULL | Creation timestamp |
updated_at | TIMESTAMPTZ | NOT NULL | Last update timestamp |
Indexes: idx_agents_workspace_slug UNIQUE on (workspace_id, slug), idx_agents_workspace_id on workspace_id
conversations
| Column | Type | Constraints | Description |
|---|---|---|---|
id | UUID | PK | Conversation identifier |
workspace_id | UUID | NOT NULL, FK -> workspaces(id) CASCADE | Parent workspace |
agent_id | UUID | NULL, FK -> agents(id) SET NULL | Associated agent |
type | TEXT | NOT NULL | Conversation type |
title | TEXT | NOT NULL | Conversation title |
unread_count | INTEGER | NOT NULL, default 0 | Unread message count |
created_at | TIMESTAMPTZ | NOT NULL | Creation timestamp |
updated_at | TIMESTAMPTZ | NOT NULL | Last update timestamp |
Indexes: idx_conversations_workspace_id on workspace_id, idx_conversations_workspace_type on (workspace_id, type)
messages
| Column | Type | Constraints | Description |
|---|---|---|---|
id | UUID | PK | Message identifier |
conversation_id | UUID | NOT NULL, FK -> conversations(id) CASCADE | Parent conversation |
role | TEXT | NOT NULL | Message role (user, assistant, system) |
content | JSONB | NOT NULL | Message content (structured) |
status | TEXT | NOT NULL | Message status |
local_id | TEXT | NULL | Client-generated ID for deduplication |
agent_id | UUID | NULL, FK -> agents(id) SET NULL | Agent that sent/received |
attachments | JSONB | NOT NULL, default '[]' | File attachments |
created_at | TIMESTAMPTZ | NOT NULL | Creation timestamp |
updated_at | TIMESTAMPTZ | NOT NULL | Last update timestamp |
Indexes: idx_messages_conversation_created_at on (conversation_id, created_at DESC, id DESC), idx_messages_local_id on local_id
mcp_audit_logs
Audit log for every MCP tool call made by Agent Runtime pods. Writes are async (goroutine with 5s timeout) to avoid response latency. Required for ISO 27001 compliance.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | UUID | PK | Audit log identifier |
user_id | TEXT | NOT NULL | User who triggered the tool call |
workspace_id | TEXT | NOT NULL | Workspace context |
session_id | TEXT | NULL | MCP session identifier |
tool_name | TEXT | NOT NULL | MCP tool name (e.g., send_push_notification) |
input | JSONB | NOT NULL | Tool call input parameters |
output | JSONB | NULL | Tool call output |
api_calls | JSONB | NULL | External API calls made during execution |
success | BOOLEAN | NOT NULL | Whether the tool call succeeded |
error_message | TEXT | NULL | Error message on failure |
duration_ms | INTEGER | NOT NULL | Execution time in milliseconds |
remote_addr | TEXT | NULL | Calling pod's IP address |
created_at | TIMESTAMPTZ | NOT NULL | Timestamp of the tool call |
tools
Global tool catalog. Seeded from migrations/orchestrator/seed/tools.json on startup.
| Column | Type | Constraints | Description |
|---|---|---|---|
name | TEXT | PK | Tool identifier |
display_name | TEXT | NOT NULL | Human-readable name |
description | TEXT | NOT NULL | Tool description |
category | TEXT | NOT NULL | Category grouping |
icon_url | TEXT | NOT NULL | Tool icon URL |
sort_order | INTEGER | NOT NULL | Display order |
created_at | TIMESTAMPTZ | NOT NULL | Creation timestamp |
agent_settings
Per-agent model and response configuration. One row per agent.
| Column | Type | Constraints | Description |
|---|---|---|---|
agent_id | UUID | PK, FK -> agents(id) CASCADE | Agent reference |
model | TEXT | NOT NULL, default 'auto' | LLM model selection |
response_length | TEXT | NOT NULL, default 'auto' | Response length preference |
allowed_tools | TEXT[] | NOT NULL, default '{}' | Tools this agent can use |
created_at | TIMESTAMPTZ | NOT NULL | Creation timestamp |
updated_at | TIMESTAMPTZ | NOT NULL | Last update timestamp |
agent_prompts
System prompt templates per agent.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | UUID | PK | Prompt identifier |
agent_id | UUID | NOT NULL, FK -> agents(id) CASCADE | Agent reference |
name | TEXT | NOT NULL | Prompt display name |
description | TEXT | NOT NULL | Prompt description |
content | TEXT | NOT NULL | System prompt text |
sort_order | INTEGER | NOT NULL | Display order |
created_at | TIMESTAMPTZ | NOT NULL | Creation timestamp |
updated_at | TIMESTAMPTZ | NOT NULL | Last update timestamp |
Indexes: idx_agent_prompts_agent_id on agent_id
agent_history
Manager-created conversation history summaries for an agent.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | UUID | PK | History entry identifier |
agent_id | UUID | NOT NULL, FK -> agents(id) CASCADE | Agent reference |
conversation_id | UUID | NULL, FK -> conversations(id) SET NULL | Associated conversation |
title | TEXT | NOT NULL | History entry title |
subtitle | TEXT | NOT NULL | History entry subtitle |
created_at | TIMESTAMPTZ | NOT NULL | Creation timestamp |
Indexes: idx_agent_history_agent_id on (agent_id, created_at DESC)
memory_drawers
Core vector store for the MemPalace memory system. Each drawer is a chunk of verbatim content with a pgvector embedding for semantic search. See the MemPalace Memory Pipeline for the full state machine and processing flow.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | TEXT | PK | MD5-based deterministic identifier |
workspace_id | UUID | NOT NULL | Workspace scope |
wing | TEXT | NOT NULL | Top-level topic category |
room | TEXT | NOT NULL | Subtopic within the wing |
hall | TEXT | NOT NULL, default '' | Optional granular grouping |
content | TEXT | NOT NULL | Verbatim memory text |
embedding | vector(1536) | NULL | pgvector embedding for semantic search |
importance | REAL | NOT NULL, default 3.0 | Priority score (0.0–5.0) |
memory_type | TEXT | NOT NULL, default '' | Classified type: decision, preference, milestone, problem, emotional, fact, task |
source_file | TEXT | NOT NULL, default '' | Origin file reference |
added_by | TEXT | NOT NULL, default 'system' | Who stored it (e.g., auto-ingest, mobile, agent name) |
added_by_agent | TEXT | NOT NULL, default '' | Agent UUID for affinity ranking |
state | TEXT | NOT NULL, default 'raw' | Pipeline state: raw, processed, merged, failed |
summary | TEXT | NOT NULL, default '' | LLM-generated one-line summary |
last_accessed_at | TIMESTAMPTZ | NULL | Updated on retrieval (TouchAccess) |
access_count | INT | NOT NULL, default 0 | Incremented on retrieval, drives decay reset |
superseded_by | TEXT | NULL | Points to newer contradicting drawer |
cluster_id | TEXT | NULL | Canonical drawer ID for merged clusters |
retry_count | INT | NOT NULL, default 0 | Cold worker failure counter (max 3) |
pipeline_tier | TEXT | NOT NULL, default 'llm' | Which arm of the cold pipeline labelled this drawer: heuristic, centroid, or llm. Added by migration 000010. |
entity_count | INT | NOT NULL, default 0 | Number of KG entities the enrichment worker wired up. Added by migration 000010. |
triple_count | INT | NOT NULL, default 0 | Number of KG triples the enrichment worker wired up. Added by migration 000010. |
filed_at | TIMESTAMPTZ | NOT NULL | When the memory was filed |
created_at | TIMESTAMPTZ | NOT NULL | Row creation timestamp |
Indexes:
idx_drawers_workspaceon(workspace_id)idx_drawers_workspace_wingon(workspace_id, wing)idx_drawers_workspace_roomon(workspace_id, wing, room)idx_drawers_stateon(state)WHEREstate = 'raw'(cold worker polling)idx_drawers_supersededon(superseded_by)WHEREsuperseded_by IS NOT NULLidx_drawers_enrichon(workspace_id, created_at)WHEREstate = 'processed' AND pipeline_tier <> 'llm' AND entity_count = 0 AND importance >= 3.0— drives thememory_enrichRiver sweep. Predicate must matchDrawerRepo.ListEnrichCandidatesexactly.
No vector index exists on memory_drawers.embedding today. Migration 000008 is an intentional no-op: both HNSW and IVFFlat were tried first and both caused SIGILL crashes on Digital Ocean CPUs without AVX2. Sequential scan is the current backend, which is acceptable at the 10K-drawer-per-workspace ceiling enforced at write time by DrawerRepo.Add. See the commentary in migrations/orchestrator/000008_ivfflat_index.up.sql for the three workarounds to try once we move to AVX2-capable nodes.
memory_entities
Knowledge graph entity nodes. Each entity is a named thing (person, service, concept) scoped to a workspace.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | TEXT | PK (composite) | SHA256-based stable entity ID from normalized name |
workspace_id | UUID | PK (composite) | Workspace scope |
name | TEXT | NOT NULL | Display name |
type | TEXT | NOT NULL, default 'unknown' | Entity classification (person, tool, concept, project, organization) |
properties | JSONB | NOT NULL, default empty object | Additional metadata |
embedding | vector(1536) | NULL | Embedding for semantic KG lookup fallback |
created_at | TIMESTAMPTZ | NOT NULL | Row creation timestamp |
Primary key: (workspace_id, id)
memory_triples
Knowledge graph temporal relationship edges. Each triple links two entities with a time-bounded predicate.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | TEXT | PK (composite) | Triple identifier |
workspace_id | UUID | PK (composite) | Workspace scope |
subject | TEXT | NOT NULL | Subject entity ID |
predicate | TEXT | NOT NULL | Relationship type |
object | TEXT | NOT NULL | Object entity ID |
valid_from | TEXT | NULL | When fact became true (YYYY-MM-DD) |
valid_to | TEXT | NULL | When fact ended (NULL = current) |
confidence | REAL | NOT NULL, default 1.0 | Confidence score |
source_closet | TEXT | NOT NULL, default '' | Source drawer ID |
source_file | TEXT | NOT NULL, default '' | Origin file |
extracted_at | TIMESTAMPTZ | NOT NULL | Extraction timestamp |
Primary key: (workspace_id, id)
Indexes:
idx_triples_subjecton(workspace_id, subject)idx_triples_objecton(workspace_id, object)idx_triples_predicateon(workspace_id, predicate)idx_triples_validityon(workspace_id, valid_from, valid_to)
memory_identities
Per-workspace L0 identity text. One row per workspace, used to inject workspace context at the start of every conversation.
| Column | Type | Constraints | Description |
|---|---|---|---|
workspace_id | UUID | PK | Workspace scope |
content | TEXT | NOT NULL, default '' | L0 identity text (max 2,000 chars) |
updated_at | TIMESTAMPTZ | NOT NULL | Last update timestamp |
memory_type_centroids
Seven prototype vectors (one per memory type) used by the Phase 2 nearest-centroid classifier in the autoingest worker. A chunk in the [HeuristicConfidenceLow, HeuristicConfidenceHigh) confidence band is classified by picking the closest centroid; if cosine similarity exceeds MemoryCentroidThreshold (0.85) and the centroid has at least MemoryCentroidMinSamples (50) training samples, the drawer is persisted with pipeline_tier = 'centroid' and state = 'processed', bypassing the LLM cold path entirely. Added by migration 000011.
Centroids are rebuilt by the weekly memory_centroid_recompute River worker (Sunday 03:00 UTC) from the last 90 days of pipeline_tier = 'llm' drawers — training never pulls its own predictions, which prevents a feedback loop. See the MemPalace Memory Pipeline for the full rationale.
| Column | Type | Constraints | Description |
|---|---|---|---|
memory_type | TEXT | PK | One of the seven memory types: decision, preference, milestone, problem, emotional, fact, task |
centroid | vector(1536) | NOT NULL | Element-wise average of the training-sample embeddings |
sample_count | INT | NOT NULL | Size of the training cohort. Rows with sample_count < 50 are ignored by NearestType lookups |
computed_at | TIMESTAMPTZ | NOT NULL, default NOW() | When this centroid was last recomputed |
source_hash | TEXT | NOT NULL | MD5 over the sorted training sample IDs. The upsert is a no-op when this hash has not changed since the previous run, so unchanged centroids do not churn dead tuples |
The table is empty on first deploy. Phase 2 is dormant until at least one memory type has accumulated 50 LLM-labelled samples — roughly the first week of real traffic on a new workspace. A best-effort 30-second startup seed (centroidSeedTimeout in cmd/crawbl/platform/orchestrator/orchestrator.go) warms the table once after riverClient.Start so operators do not have to wait until Sunday 03:00 UTC for the first recompute.
integration_connections
OAuth connections between users and external providers.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | UUID | PK | Connection identifier |
user_id | UUID | NOT NULL, FK -> users(id) CASCADE | User reference |
workspace_id | UUID | NOT NULL, FK -> workspaces(id) CASCADE | Workspace scope |
provider | TEXT | NOT NULL | Provider name (e.g., gmail, slack) |
status | TEXT | NOT NULL, default 'active' | Connection status |
access_token_encrypted | TEXT | NULL | Encrypted OAuth access token |
refresh_token_encrypted | TEXT | NULL | Encrypted OAuth refresh token |
token_expires_at | TIMESTAMPTZ | NULL | Token expiry |
scopes | TEXT[] | NOT NULL | Granted OAuth scopes |
provider_user_id | TEXT | NULL | User ID at provider |
provider_user_email | TEXT | NULL | Email at provider |
metadata | JSONB | NOT NULL | Additional provider metadata |
created_at | TIMESTAMPTZ | NOT NULL | Creation timestamp |
updated_at | TIMESTAMPTZ | NOT NULL | Last update timestamp |
Indexes: idx_integration_connections_unique UNIQUE on (user_id, workspace_id, provider)
model_pricing
Per-token costs for LLM models. Append-only — new rows inserted on price changes, old rows preserved for historical cost computation.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | UUID | PK | Row identifier |
provider | TEXT | NOT NULL | LLM provider (e.g., openai) |
model | TEXT | NOT NULL | Model identifier (e.g., gpt-4o) |
region | TEXT | NOT NULL, default 'global' | Pricing region |
input_cost_per_token | NUMERIC(18,12) | NOT NULL | Cost per input token (USD) |
output_cost_per_token | NUMERIC(18,12) | NOT NULL | Cost per output token (USD) |
cached_cost_per_token | NUMERIC(18,12) | NOT NULL | Cost per cached token (USD) |
source | TEXT | NOT NULL, default 'manual' | Price source |
effective_at | TIMESTAMPTZ | NOT NULL | When this price takes effect |
created_at | TIMESTAMPTZ | NOT NULL | Row creation timestamp |
Indexes: idx_model_pricing_lookup on (provider, model, region, effective_at DESC)
usage_plans
Plan definitions. Seeded from migrations/orchestrator/seed/usage_plans.json.
| Column | Type | Constraints | Description |
|---|---|---|---|
plan_id | TEXT | PK | Plan identifier (free, pro) |
name | TEXT | NOT NULL | Display name |
monthly_token_limit | BIGINT | NOT NULL, default 500000 | Tokens per month |
daily_request_limit | INT | NULL | Daily request cap (unused) |
max_tokens_per_request | INT | NULL | Per-request cap (unused) |
created_at | TIMESTAMPTZ | NOT NULL | Creation timestamp |
updated_at | TIMESTAMPTZ | NOT NULL | Last update timestamp |
usage_quotas
Per-user plan assignment. Active quota has expires_at IS NULL.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | UUID | PK | Row identifier |
user_id | UUID | NOT NULL, FK -> users(id) CASCADE | User reference |
plan_id | TEXT | NOT NULL, FK -> usage_plans DEFAULT 'free' | Plan reference |
effective_at | TIMESTAMPTZ | NOT NULL | When assignment started |
expires_at | TIMESTAMPTZ | NULL | NULL = active |
created_at | TIMESTAMPTZ | NOT NULL | Creation timestamp |
updated_at | TIMESTAMPTZ | NOT NULL | Last update timestamp |
Indexes: idx_usage_quotas_user_active UNIQUE on (user_id, plan_id) WHERE expires_at IS NULL
usage_counters
Monthly running totals per user. One row per (user_id, period).
| Column | Type | Constraints | Description |
|---|---|---|---|
user_id | UUID | PK (composite), FK -> users(id) CASCADE | User reference |
period | TEXT | PK (composite) | Billing period (YYYY-MM) |
tokens_used | BIGINT | NOT NULL, default 0 | Total tokens this month |
prompt_tokens_used | BIGINT | NOT NULL, default 0 | Input tokens |
completion_tokens_used | BIGINT | NOT NULL, default 0 | Output tokens |
cost_usd | NUMERIC(12,6) | NOT NULL, default 0 | Accumulated cost (internal only) |
request_count | INT | NOT NULL, default 0 | Number of LLM calls |
last_updated_at | TIMESTAMPTZ | NOT NULL | Last increment timestamp |
agent_usage_counters
Lifetime usage totals per agent.
| Column | Type | Constraints | Description |
|---|---|---|---|
agent_id | UUID | PK, FK -> agents(id) CASCADE | Agent reference |
workspace_id | UUID | NOT NULL, FK -> workspaces(id) CASCADE | Workspace reference |
tokens_used | BIGINT | NOT NULL, default 0 | Total tokens ever |
prompt_tokens_used | BIGINT | NOT NULL, default 0 | Lifetime input tokens |
completion_tokens_used | BIGINT | NOT NULL, default 0 | Lifetime output tokens |
cost_usd | NUMERIC(12,6) | NOT NULL, default 0 | Accumulated cost (internal only) |
request_count | INT | NOT NULL, default 0 | Number of LLM calls |
last_updated_at | TIMESTAMPTZ | NOT NULL | Last increment timestamp |
Indexes: idx_agent_usage_counters_workspace on workspace_id