Перейти к основному содержимому

The MemPalace Memory System

Crawbl's persistent, per-workspace memory system. Agents use it to build context before every response. Covers the palace data model, the 3-phase ingestion pipeline, the 4-layer retrieval stack, the knowledge graph, and every MCP tool.


1. Overview

LLMs have no memory between conversations. MemPalace gives each workspace a persistent, searchable memory system backed by PostgreSQL (with pgvector for embeddings) and Redis (for palace-graph room aggregation caching). There is no external vector database and no separate microservice.

All agents in a workspace share one memory pool. Topic separation comes from the wing/room taxonomy, not agent boundaries. Every memory operation is scoped by workspace_id -- agents cannot read memories from other workspaces.

Codebase location

crawbl-backend/internal/orchestrator/memory/
├── types.go # Core domain types (Drawer, Entity, Triple, Identity,
│ # HybridSearchResult, TraversalResult, Tunnel,
│ # PipelineTier constants, HeuristicKillSwitchValue)
├── repo/ # All persistence (consumer-side interfaces per consumer)
│ ├── drawerrepo/ # pgvector drawers + hybrid CTE search
│ ├── centroidrepo/ # memory_type_centroids (Phase 2 k-NN)
│ ├── kgrepo/ # Knowledge graph: entities + temporal triples
│ ├── palacegraphrepo/ # BFS traversal + Redis-cached room aggregation
│ └── identityrepo/ # memory_identities upsert/read
├── layers/ # 4-layer retrieval stack (L0–L3)
│ ├── stack.go # Composes layers into WakeUp / Recall / Search
│ ├── l0_identity.go # L0: workspace identity (via identityrepo)
│ ├── l1_essential.go # L1: top memories by importance
│ ├── l2_ondemand.go # L2: filtered by wing/room
│ ├── l3_search.go # L3: fallback pgvector-only search
│ └── retrieval.go # HybridRetrieve — one CTE, no goroutines
├── autoingest/ # In-process pond pool for the hot path (NOT River)
│ ├── types.go # Service, Work, Deps, Config, Metrics interfaces
│ ├── service.go # NewService — wires pond.TypedPool; Submit + Shutdown
│ ├── worker.go # per-chunk pipeline (classify → embed → centroid? → persist)
│ └── helpers.go # isNoise, chunkText, buildDrawer, autoIngestDrawerID
├── jobs/ # Business logic for the cold pipeline (driver-agnostic)
│ ├── process.go # RunProcess — LLM reclassification
│ ├── maintain.go # RunMaintain — decay + prune
│ ├── enrich.go # RunEnrich — KG backfill
│ └── centroids.go # RunCentroidRecompute — weekly centroid rebuild
├── extract/ # Heuristic + LLM memory classifiers
│ └── classify.go # Regex-based heuristic classifier
└── config/ # Embedded JSON config (noise_patterns, classify_patterns)

River adapters for the cold pipeline live in internal/orchestrator/queue/memory_workers.go, keeping the jobs/ business logic free of River imports.

The SQL migration is at migrations/orchestrator/000005_memory_palace.up.sql.


2. Data Model

The palace metaphor

Every memory chunk is called a drawer -- a piece of verbatim text filed into a location in the palace:

Memory Palace Hierarchy
Click diagram to zoom

Wing is the broadest category (like a department). Room is a topic within that wing. Hall is optional extra granularity -- most drawers skip it. The combination of wing + room is the primary navigation path.

Schema

memory_drawers

FieldTypePurpose
idTEXT PKMD5-based deterministic ID
workspace_idTEXTTenant isolation (all queries are scoped)
wingTEXTTop-level category
roomTEXTSubtopic within the wing
hallTEXTOptional granular grouping
contentTEXTVerbatim memory text (max 10,000 chars)
embeddingvector(1536)pgvector embedding (text-embedding-3-small)
importanceFLOATPriority score 0--5 (default 3.0)
memory_typeTEXTdecision|preference|milestone|problem|emotional|fact|task
pipeline_tierTEXTheuristic|centroid|llm -- which classifier made the final call
stateTEXTraw|processed|merged|failed
summaryTEXTLLM-generated one-line summary (cold path only)
source_fileTEXTWhere the memory originated
added_byTEXT"auto-ingest", "mobile", agent name, etc.
added_by_agentTEXTAgent UUID for affinity ranking
filed_atTIMESTAMPWhen it was filed
last_accessed_atTIMESTAMPTZUpdated on retrieval (TouchAccess)
access_countINTIncremented on retrieval
superseded_byTEXTPoints to newer contradicting drawer
cluster_idTEXTCanonical drawer ID for merged clusters
retry_countINTCold worker failure counter (max 3)
entity_countINTFilled by memory_enrich worker
triple_countINTFilled by memory_enrich worker

memory_entities

FieldPurpose
(workspace_id, id)Composite PK. ID is SHA256 of normalized name
nameDisplay name (e.g., "PostgreSQL", "Alice")
typeClassification (e.g., "technology", "person", "service")
propertiesJSON metadata bag
embeddingvector(1536) -- column exists, embedding fallback retrieval not yet implemented

memory_triples

FieldPurpose
(workspace_id, id)Composite PK
subject, predicate, objectEntity IDs forming a directed relationship
valid_from, valid_toTemporal range (NULL valid_to = current fact)
confidenceRelationship confidence score
source_closetOrigin drawer reference

memory_identities

One row per workspace. Holds the L0 identity text (max 2,000 characters).

memory_type_centroids

FieldPurpose
memory_typePK -- one row per memory type
centroidvector(1536) -- element-wise average of LLM-labelled embeddings
sample_countRows used; below 50 the centroid is ignored
computed_atLast recompute timestamp
source_hashRecompute is a no-op when hash is unchanged

State machine

                  +-- retry < 3 ---+
| |
[INSERT] --> raw ----> processed |
| ^ | |
| +-------+ |
| |
+-- retry >= 3 --> failed

processed ----> merged (cluster canonical absorbs this drawer)

3. Ingestion Pipeline

flowchart TD
A["User Message → Agent Reply\nstream.go finalize()"] --> B

subgraph HOT["HOT PATH (request goroutine)"]
B["chatservice.autoIngestConversation\nbuild exchange, trim noise"] --> C["autoingest.Pool.Submit\nnon-blocking"]
C --> D{queue full?}
D -- yes --> E["drop + warn log\n+ Dropped counter"]
D -- no --> F["queued in pond pool"]
end

subgraph POOL["AUTO-INGEST POOL (in-process alitto/pond)"]
F --> G["chunk + heuristic classify"]
G --> H{confidence >= 0.8?}
H -- yes --> I["embed + dedup + persist\nstate=processed\npipeline_tier=heuristic"]
H -- no --> J{confidence >= 0.5?}
J -- yes --> K["embed + centroid NearestType"]
K --> L{cosine > 0.85?}
L -- yes --> M["persist\nstate=processed\npipeline_tier=centroid"]
L -- no --> N["persist\nstate=raw\npipeline_tier=llm"]
J -- no --> N
I --> O["publish NATS MemoryEvent"]
M --> O
N --> O
end

subgraph RIVER["RIVER WORKERS (periodic, not on hot path)"]
P["memory_process\n1-min sweep\nclaim raw drawers\nLLM batch classify\nentity link + cluster\nstate: raw → processed"]
Q["memory_enrich\n10-min sweep\nKG backfill for\nheuristic/centroid drawers\nimportance >= 3"]
R["memory_maintain\ndaily midnight\ndecay + prune"]
S["memory_centroid_recompute\nSunday 03:00 UTC\nrebuild prototype vectors\nfrom llm-labelled drawers"]
end

N -.->|"picked up within 60s"| P
I -.->|"high-importance only"| Q
M -.->|"high-importance only"| Q

Hot path: auto-ingest pool

After stream.go finalize() returns an agent reply, chatservice.autoIngestConversation builds the exchange pair, trims noise, and calls autoingest.Pool.Submit(). The request goroutine returns immediately -- orchestrator response latency is unaffected.

The pool is backed by github.com/alitto/pond (v2) with bounded capacity and non-blocking submit. If the queue is full, the work is dropped with a metric increment and warn log. The original messages remain in the messages table for potential future replay.

Why not River? Every chat turn would write one river_job row on the critical path. At scale, that is O(messages/second) Postgres writes just to hand a payload to a worker in the same process. River is used for periodic/cross-pod/must-survive-restart work; pond handles hot-path fan-out inside one pod.

Per-chunk pipeline (autoingest/worker.go):

  1. Noise filter -- drop greetings, very short messages (configurable via embedded noise_patterns.json)
  2. Chunk -- split content > 800 chars at sentence boundaries with 100-char overlap
  3. Heuristic classify -- regex-based scoring in extract/classify.go
  4. Embed -- text-embedding-3-small via the configured embedding provider
  5. Dedup -- skip if cosine similarity > 0.85 against existing drawers
  6. Tier decision (pickTier):
    • confidence >= HeuristicConfidenceHigh (default 0.8) → state=processed, pipeline_tier=heuristic -- done
    • confidence in [HeuristicConfidenceLow, HeuristicConfidenceHigh) AND centroid lookup finds similarity > 0.85 → state=processed, pipeline_tier=centroid -- done
    • otherwise → state=raw, pipeline_tier=llm -- cold path picks it up
  7. Persist -- idempotent insert via DrawerRepo.AddIdempotent
  8. Publish -- emit a NATS MemoryEvent

Pool sizing (env-configurable):

Env varDefaultPurpose
CRAWBL_AUTOINGEST_WORKERS16Concurrent goroutines (sized for I/O-bound embedding calls)
CRAWBL_AUTOINGEST_CAPACITY1024Queue depth (~1s head-room at 1K msg/sec per pod)

Cold path: River workers

All cold workers run as River periodic jobs inside the orchestrator binary. No separate scheduler component.

memory_process (1-minute sweep)

jobs/process.goRunProcess. Claims state=raw drawers with FOR UPDATE SKIP LOCKED (multi-pod safe). Batch classifies all drawers per workspace in one gpt-4o-mini structured output call. Falls back to individual calls on parse failure.

Steps per drawer:

  1. LLM returns memory_type, importance (0--1, scaled to 0--5), entities, summary, and relationship triples
  2. Entities upserted into KG, triples create relationship edges
  3. Sets pipeline_tier = 'llm'
  4. Clustering: drawers with cosine > 0.85 are merged (canonical absorbs cluster members, others get state=merged)
  5. Conflict detection: drawers in 0.75--0.90 cosine range checked for contradiction via LLM. Older drawer gets superseded_by = new_id
  6. State transitions: raw → processed (or failed after 3 retries)

memory_enrich (10-minute sweep)

jobs/enrich.goRunEnrich. High-confidence drawers that bypassed the cold pipeline miss entity linking. For drawers with importance >= 3, this worker runs LLM extract to backfill KG entities and triples. Updates entity_count and triple_count on the drawer.

Query: state=processed AND pipeline_tier <> 'llm' AND entity_count=0 AND importance >= 3.0 ORDER BY created_at ASC LIMIT 100

Low-importance drawers (importance < 3) stay entity-less permanently.

memory_maintain (daily at midnight UTC)

jobs/maintain.goRunMaintain. Only processes workspaces with activity in the last 24 hours.

  • Decay: importance = max(importance * 0.98, 0.3) for drawers older than 30 days and not accessed within 7 days. ~2 month half-life.
  • Pruning: deletes drawers with importance < 0.5 AND access_count < 3, keeping minimum 100 per workspace.
  • Access-based reinforcement: retrieval calls TouchAccess(), resetting the decay clock.

memory_centroid_recompute (Sunday 03:00 UTC)

jobs/centroids.goRunCentroidRecompute. Aggregates up to 500 LLM-labelled drawers per type from the last 90 days, averages embeddings in Go, upserts into memory_type_centroids. The source_hash conditional update makes the job a no-op when no new LLM-labelled drawers exist.

Feedback-loop prevention: centroids are trained only on pipeline_tier='llm' drawers. Centroid-labelled drawers are excluded from all recomputes.

Sample floor: sample_count < 50 causes NearestType to return found=false. Phase 2 falls through to the cold LLM path. New workspaces stay safe until enough LLM-labelled history accumulates.

Kill switches

Both phase gates are read once at boot from env vars. Defaults are 999.0 (disabled -- every chunk falls to LLM path).

Env varDefaultEffect when set
CRAWBL_MEM_HEURISTIC_HIGH999.0Set to 0.8 to enable Phase 1 (heuristic trust)
CRAWBL_MEM_HEURISTIC_LOW999.0Set to 0.5 to enable Phase 2 (centroid k-NN band)

To disable Phase 2 only: set CRAWBL_MEM_HEURISTIC_LOW = CRAWBL_MEM_HEURISTIC_HIGH. The centroid band collapses to zero width. Requires pod restart.

Crash recovery

Failure pointWhat happensRecovery
Pod crashes between Submit and worker pickupChunk lostmessages row exists for future replay
Crash after persist (state=raw)Drawer sits rawmemory_process sweep picks it up within 60s
Crash after persist (state=processed)Drawer done; entity linking pendingmemory_enrich sweep picks it up within 10 min

4. Memory Classification

The seven memory types

TypeWhat it captures
decisionArchitecture choices, technology picks, trade-offs
preferencePersonal or team style rules
milestoneAchievements, breakthroughs, completed work
problemBugs, errors, root causes, and their fixes
emotionalPersonal feelings, team morale moments
factFactual statements about the user, project, or domain
taskPending or in-progress work items

Heuristic classifier

extract/classify.go scores segments against regex marker patterns loaded from config/classify_patterns.json:

rawScore    = sum of regex marker hits across all memory types
lengthBonus = +2 if segment > 500 chars, +1 if segment > 200 chars, else 0
confidence = min(1.0, (bestTypeScore + lengthBonus) / 5.0)

The classifier also runs sentiment analysis (positive/negative word lists from config) and disambiguation logic -- e.g., a "problem" with resolution cues and positive sentiment may be reclassified as "milestone".

Pipeline tier column

memory_drawers.pipeline_tier records which classifier made the final type decision:

ValueSet byMeaning
heuristicAuto-ingest poolRegex confidence >= 0.8; cold LLM skipped
centroidAuto-ingest poolEmbedding nearest-centroid above cosine 0.85; cold LLM skipped
llmCold pipeline (memory_process)Fell through both classifiers; LLM made the call

5. Retrieval: 4-Layer Stack

When an agent needs context, the memory system provides it through a layered stack (layers/stack.go). Each layer adds progressively more detail within a total character budget.

Memory Retrieval Stack
Click diagram to zoom

L0 -- Identity

The workspace's personality and context. Set once via memory_set_identity, injected at the start of every conversation via WakeUp().

  • Budget: 400 characters (never truncated)
  • Source: memory_identities table (one row per workspace)
  • Renderer: layers/l0_identity.gorenderL0

L1 -- Essential Story

The most important memories across the workspace, from the top 15 drawers ranked by importance.

  • Budget: 2,000 characters (truncated with "... (more in L3 search)" if exceeded)
  • Source: DrawerRepo.GetTopByImportance() -- sorted by importance descending, optionally filtered by wing
  • Grouped by: Room, sorted alphabetically for deterministic output
  • Snippet limit: 200 characters per drawer
  • Renderer: layers/l1_essential.gorenderL1

L2 -- On-Demand Recall

Retrieved when an agent explicitly asks for memories from a specific wing and room. Not injected automatically.

  • Budget: 1,200 characters
  • Source: DrawerRepo.GetByWingRoom() -- filtered retrieval
  • Default limit: 10 drawers, 300 chars per snippet
  • Renderer: layers/l2_ondemand.gorenderL2

Semantic search that combines pgvector ANN with knowledge graph entity lookup in a single Postgres CTE query (drawerrepo.SearchHybrid). Falls back to pure vector search (renderL3) if hybrid retrieval fails.

  • Budget: 14,000 characters (hard cap on total output)
  • Default limit: 5 results (max 50)
  • Ranking formula (layers/retrieval.gorankHybridResults):
    finalScore = importance × recencyFactor × max(similarity, graphScore) + agentAffinityBoost(0.1)
    Where recencyFactor = 1.0 / (1.0 + daysSinceAccess / 30.0)
  • KG branch: query words >= 4 chars are forwarded as KG entity lookup terms
  • Access tracking: all returned drawers get TouchAccessBatch() -- updates last_accessed_at, increments access_count, keeping hot memories alive against decay
  • Renderer: layers/l3_search.gorenderL3 (pure vector) or stack.Search (hybrid)

Token budgets (in characters, ~4 chars per token)

LayerBudgetBehavior
L0 -- Identity400Never truncated
L1 -- Essential Story2,000Truncated first, shows "more in L3"
L2 -- On-Demand1,200Returned as-is
L3 -- Search14,000 (hard cap)Result count limited

6. Knowledge Graph

Entities identified by SHA256 hash of normalized name. Temporal triples with valid_from/valid_to for time-bounded facts.

Entities and triples

An entity is a named thing (person, service, concept, project). A triple is a temporal relationship:

[Subject] --predicate--> [Object]
with valid_from / valid_to timestamps

For example:

[Crawbl Backend] --uses--> [PostgreSQL]       valid_from: 2025-01-15, valid_to: NULL (current)
[Crawbl Backend] --uses--> [MongoDB] valid_from: 2024-06-01, valid_to: 2025-01-14 (expired)
[Alice] --owns--> [Auth Module] valid_from: 2025-03-01, valid_to: NULL (current)

The valid_from/valid_to fields let agents answer questions like "what database did we use before PostgreSQL?" or "who owned the auth module in Q4?". Facts expire naturally when valid_to is set -- they are not deleted.

Palace graph navigation

The PalaceGraph layer (palacegraphrepo) adds spatial reasoning on top of drawers, with Redis-cached room aggregation via internal/pkg/redisclient:

  • Traverse -- BFS from a starting room, hopping through shared wings to find connected rooms
  • FindTunnels -- discover rooms that appear in multiple wings (cross-cutting concerns)
  • GraphStats -- room count, tunnel count, edges, rooms per wing

Workspace limits

ResourceLimit
Drawers per workspace10,000
Entities per workspace5,000
Triples per workspace50,000
Drawer content length10,000 characters
Identity (L0) length2,000 characters

7. MCP Tools

19 MCP tools registered in internal/orchestrator/server/mcp/tools_memory.go.

Read tools

ToolPurpose
memory_statusTotal drawer count, number of wings and rooms
memory_list_wingsList all wings with drawer counts
memory_list_roomsList rooms, optionally filtered by wing
memory_get_taxonomyFull wing → room hierarchy with counts
memory_searchSemantic vector search by natural language query
memory_check_duplicateFind drawers similar to a given text (threshold 0.9)
memory_traverseBFS room traversal from a starting room
memory_find_tunnelsFind rooms bridging two wings
memory_graph_statsPalace graph overview (rooms, tunnels, edges)

Write tools

ToolPurpose
memory_add_drawerStore a new memory with auto-classification and embedding
memory_delete_drawerRemove a drawer by ID
memory_set_identitySet or update the L0 identity text

Knowledge graph tools

ToolPurpose
memory_kg_queryQuery entity relationships (incoming, outgoing, or both)
memory_kg_addAdd a temporal triple (auto-creates entities if missing)
memory_kg_invalidateMark a relationship as ended (set valid_to)
memory_kg_timelineChronological view of all facts about an entity
memory_kg_statsEntity and triple counts, relationship type list

Diary tools

ToolPurpose
memory_diary_writeWrite an agent-scoped diary entry (hall = agent name)
memory_diary_readRead an agent's recent diary entries

Diary tools are a convenience wrapper around drawers. They auto-set wing = "diary" and hall = agent name, giving each agent a private journal within the shared workspace memory.


8. Backend Wiring

The memory system is wired up in cmd/crawbl/platform/orchestrator/orchestrator.go:

var drawerRepo      = drawerrepo.NewPostgres()
var kgRepo = kgrepo.NewPostgres()
var palaceGraphRepo = palacegraphrepo.NewPostgres(redisClient, logger)
var identityRepo = identityrepo.NewPostgres()
classifier := extract.NewClassifier()

if baseURL != "" {
embedder = embed.NewProvider(...)
memoryStack = layers.NewStack(drawerRepo, identityRepo, embedder)
}

These are passed to three services:

ServiceWhat it usesWhy
ChatServicememoryStack + ingestPoolCalls WakeUp() to inject L0+L1 context; submits work to autoingest.Pool after each turn
AgentServicedrawerRepoLists memories for the agent detail UI
MCPServiceAll repos + classifier + embedderExposes the MCP tools to agents
Memory Chat Flow
Click diagram to zoom

Graceful shutdown order

  1. Socket.IO teardown -- stop accepting new client connections
  2. ingestPool.Shutdown(shutdownCtx) -- drain in-flight pond tasks
  3. pkgriver.Shutdown(riverClient) -- three-phase River shutdown (20s/10s/force)
  4. DB connection close

9. Configuration

VariableRequiredDefaultPurpose
CRAWBL_EMBED_BASE_URLYes--Embedding API endpoint
CRAWBL_EMBED_API_KEYYes--Embedding API key
CRAWBL_EMBED_MODELNotext-embedding-3-smallEmbedding model
CRAWBL_LLM_BASE_URLNoCRAWBL_EMBED_BASE_URLChat completions API
CRAWBL_LLM_API_KEYNoCRAWBL_EMBED_API_KEYChat completions key
CRAWBL_CLASSIFY_MODELNogpt-4o-miniClassification model
CRAWBL_AUTOINGEST_WORKERSNo16Pool worker count
CRAWBL_AUTOINGEST_CAPACITYNo1024Pool queue depth
CRAWBL_MEM_HEURISTIC_HIGHNo999.0 (disabled)Phase 1 gate
CRAWBL_MEM_HEURISTIC_LOWNo999.0 (disabled)Phase 2 gate

Embedded JSON configs (config/): noise_patterns.json (noise words/patterns), classify_patterns.json (heuristic regex markers). Both loaded via go:embed -- changes require recompilation.


10. Key Constants

ConstantValueLocation
l1MaxDrawers15layers/l1_essential.go
maxSnippetLen200layers/l1_essential.go
l2MaxSnippetLen300layers/l2_ondemand.go
l3MaxSnippetLen300layers/l3_search.go
DefaultImportance3.0types.go
AutoIngestChunkSize800types.go
AutoIngestChunkOverlap100types.go
AutoIngestDupThreshold0.85types.go
AutoIngestMinConfidence0.3types.go
ColdWorkerClusterThreshold0.85types.go
ColdWorkerConflictLow0.75types.go
ColdWorkerConflictHigh0.90types.go
ColdWorkerMaxRetries3types.go
DecayFactor0.98types.go
DecayFloor0.3types.go
DecayAgeDays30types.go
PruneThreshold0.5types.go
PruneMinAccessCount3types.go
PruneKeepMin100types.go
MemoryCentroidThreshold0.85types.go
MemoryCentroidMinSamples50types.go
ReinforcementThreshold0.7types.go
ReinforcementBoost0.5types.go
MaxImportance5.0types.go
Embedding dimensions1536pgvector column size

11. Known Limitations

High:

  • IVFFlat index (migration 000008) untested on DigitalOcean CPUs. May SIGILL like HNSW. Sequential scan is the fallback, acceptable at fewer than 10K drawers.
  • No cost tracking for cold path LLM calls. No per-workspace attribution.

Medium:

  • Batch classification cannot use JSON mode (array incompatibility). Falls back to N+1 calls on parse failure.
  • memory_drawers.id is TEXT PK, not composite (workspace_id, id). Cross-workspace isolation relies on code, not schema.
  • Phase 2 is dormant until the centroid table has >= 50 samples per type (roughly the first week of LLM-labelled traffic on a new deployment). This is expected behavior.

Low:

  • NATS worker migration incomplete (publisher only, no consumer).
  • KG entity embedding fallback not implemented (column exists, retrieval deferred).
  • added_by_agent field name implies slug but stores UUID.