ClickHouse

ClickHouse is the analytics database that stores detailed LLM token usage events. It runs as a single-node StatefulSet in the backend namespace alongside the orchestrator and PostgreSQL.

Connection Details

Property	Value
Service name	`backend-clickhouse`
Namespace	`backend`
HTTP port	8123
Native port	9000
Database	`default`
Username	`default`
Image	`clickhouse/clickhouse-server:24.8-alpine`
Storage	10 Gi persistent volume
CPU	100m request / no limit
Memory	256 Mi request / 1 Gi limit

Getting the password

The password is stored in AWS Secrets Manager at crawbl/dev/backend/clickhouse and synced to the Kubernetes secret backend-clickhouse-auth by External Secrets:

kubectl get secret backend-clickhouse-auth -n backend \
  -o jsonpath='{.data.password}' | base64 -d

Connecting to ClickHouse

Port-forward (recommended for development)

# Forward the native port (for clickhouse-client)
kubectl port-forward svc/backend-clickhouse 9000:9000 -n backend

# Forward the HTTP port (for curl / DBeaver / DataGrip)
kubectl port-forward svc/backend-clickhouse 8123:8123 -n backend

Using clickhouse-client (CLI)

Install the client:

# macOS
brew install clickhouse

# Or use the official Docker image
docker run -it --rm --network host clickhouse/clickhouse-client

Connect (after port-forwarding):

clickhouse-client \
  --host localhost \
  --port 9000 \
  --user default \
  --password "$(kubectl get secret backend-clickhouse-auth -n backend -o jsonpath='{.data.password}' | base64 -d)"

Using the HTTP interface (curl)

CH_PASS=$(kubectl get secret backend-clickhouse-auth -n backend \
  -o jsonpath='{.data.password}' | base64 -d)

curl "http://localhost:8123/?user=default&password=${CH_PASS}" \
  --data "SELECT count() FROM llm_usage"

Using a GUI tool (DBeaver / DataGrip)

Setting	Value
Host	`localhost`
Port	`8123` (HTTP) or `9000` (native)
Database	`default`
User	`default`
Password	(from the secret above)
Driver	ClickHouse

tip

DBeaver has a built-in ClickHouse driver. In DataGrip, select "ClickHouse" as the data source type.

Tables

llm_usage (raw events)

Every individual LLM call is recorded here. This is the detailed event log.

Column	Type	Description
`event_id`	UUID	Unique event identifier (auto-generated)
`event_time`	DateTime64(3)	When the LLM call happened (UTC, millisecond precision)
`user_id`	UUID	The platform user who triggered the call
`workspace_id`	UUID	The workspace the call was made from
`conversation_id`	UUID	The conversation context
`message_id`	String	The message that triggered the call
`agent_id`	String	Agent slug (e.g., `"wally"`, `"manager"`)
`agent_db_id`	UUID	Agent's database ID
`model`	String	LLM model used (e.g., `"gpt-4o"`, `"claude-3-sonnet"`)
`provider`	String	LLM provider (e.g., `"openai"`, `"anthropic"`)
`prompt_tokens`	Int32	Input tokens consumed
`completion_tokens`	Int32	Output tokens generated
`total_tokens`	Int32	Sum of prompt + completion
`tool_use_prompt_tokens`	Int32	Tokens used in tool call prompts
`thoughts_tokens`	Int32	Tokens used for extended thinking
`cached_tokens`	Int32	Tokens served from provider cache
`cost_usd`	Decimal64(6)	Computed cost in USD
`call_sequence`	Int32	Order within a single turn (0-indexed)
`turn_id`	String	Turn identifier
`session_id`	String	Session/conversation ID

Engine: MergeTree, partitioned by toYYYYMM(event_time), ordered by (user_id, workspace_id, event_time, event_id).

TTL: 13 months -- events older than 13 months are automatically deleted.

llm_usage_daily (aggregated rollups)

Daily summaries auto-populated by a materialized view. Use this for dashboard queries instead of scanning raw events.

Column	Type	Description
`day`	Date	The calendar day
`user_id`	UUID	The user
`workspace_id`	UUID	The workspace
`model`	String	The LLM model
`agent_id`	String	The agent slug
`total_prompt_tokens`	Int64	Sum of input tokens for the day
`total_completion_tokens`	Int64	Sum of output tokens for the day
`total_tokens`	Int64	Sum of all tokens for the day
`total_cost_usd`	Decimal128(6)	Sum of costs for the day
`request_count`	Int64	Number of LLM calls for the day

Engine: SummingMergeTree, partitioned by toYYYYMM(day), ordered by (user_id, workspace_id, day, model, agent_id).

TTL: 25 months.

info

The SummingMergeTree engine automatically merges rows with the same key by summing the numeric columns. This means you should always use sum() in queries to get accurate results, as un-merged parts may still contain duplicate partial rows.

Supporting Services

pricing_refresh River job

Model pricing is refreshed daily by the pricing_refresh River worker running inside the orchestrator process — no standalone pod or CronJob. The worker uses modelpricingrepo to insert new rows into the Postgres model_pricing table, fetching current per-token prices from LiteLLM.

Property	Value
Queue	`pricing_refresh`
Schedule	Daily periodic (River periodic job)
Runs in	Orchestrator pod
Writes to	Postgres `orchestrator.model_pricing`

Check recent runs via riverui or Postgres:

kubectl port-forward svc/backend-postgresql 5432:5432 -n backend
psql -h localhost -U crawbl -d crawbl \
  -c "SELECT id, state, attempt, finalized_at FROM orchestrator.river_job WHERE kind = 'pricing_refresh' ORDER BY finalized_at DESC LIMIT 10;"

Example Queries

Total usage for a user this month

SELECT
    sum(total_tokens) AS tokens,
    sum(cost_usd) AS cost_usd,
    count() AS requests
FROM llm_usage
WHERE user_id = '<user-uuid>'
  AND event_time >= toStartOfMonth(now());

Usage breakdown by model

SELECT
    model,
    sum(prompt_tokens) AS input_tokens,
    sum(completion_tokens) AS output_tokens,
    sum(total_tokens) AS total_tokens,
    sum(cost_usd) AS cost_usd,
    count() AS requests
FROM llm_usage
WHERE workspace_id = '<workspace-uuid>'
  AND event_time >= toStartOfMonth(now())
GROUP BY model
ORDER BY cost_usd DESC;

Daily usage trend (last 30 days)

SELECT
    day,
    sum(total_tokens) AS tokens,
    sum(total_cost_usd) AS cost_usd,
    sum(request_count) AS requests
FROM llm_usage_daily
WHERE user_id = '<user-uuid>'
  AND day >= today() - 30
GROUP BY day
ORDER BY day;

Most active agents by cost

SELECT
    agent_id,
    sum(total_tokens) AS tokens,
    sum(cost_usd) AS cost_usd,
    count() AS requests
FROM llm_usage
WHERE workspace_id = '<workspace-uuid>'
  AND event_time >= toStartOfMonth(now())
GROUP BY agent_id
ORDER BY cost_usd DESC;

Hourly request rate (spot traffic spikes)

SELECT
    toStartOfHour(event_time) AS hour,
    count() AS requests,
    sum(total_tokens) AS tokens
FROM llm_usage
WHERE event_time >= now() - INTERVAL 24 HOUR
GROUP BY hour
ORDER BY hour;

Check if data is flowing (health check)

SELECT
    count() AS events_last_hour,
    max(event_time) AS latest_event
FROM llm_usage
WHERE event_time >= now() - INTERVAL 1 HOUR;

tip

If events_last_hour is 0 and users are actively chatting, check the orchestrator logs for River usage_write job errors and verify ClickHouse connectivity from the orchestrator pod.

Secrets Reference

Secret	Location (AWS SM)	K8s Secret	Used by
ClickHouse password	`crawbl/dev/backend/clickhouse`	`backend-clickhouse-auth`	StatefulSet, orchestrator (usagewriter worker)
Postgres password	`crawbl/dev/backend/postgresql`	`backend-postgresql-auth`	Orchestrator (pricing_refresh River job, migrations)

Troubleshooting

"No data in ClickHouse"

Check if the orchestrator is running and healthy: kubectl get pods -n backend -l app.kubernetes.io/name=orchestrator
Check orchestrator logs for usagewriter errors: kubectl logs deployment/orchestrator -n backend | grep -i usage
Verify River job queue has no stuck usage_write jobs: query orchestrator.river_job where kind = 'usage_write' and state IN ('retryable', 'discarded')
Verify ClickHouse is healthy: kubectl exec -n backend backend-clickhouse-0 -- clickhouse-client -q "SELECT 1"

"Costs are showing as 0"

Check if model_pricing has entries: query Postgres for SELECT count(*) FROM orchestrator.model_pricing
Check if the pricing_refresh River job ran successfully: query orchestrator.river_job where kind = 'pricing_refresh' and check state and finalized_at
Verify the pricing cache loaded: check orchestrator startup logs for "pricing cache refreshed"

"Usage counters seem wrong"

Compare Postgres counters with ClickHouse raw events -- ClickHouse is the source of truth for detailed data
Check if the agent runtime is actually emitting UsageEvents: look for "llm usage" in orchestrator logs
Remember: Postgres counters are incremented in real-time, ClickHouse data may be delayed by up to the batch flush interval (500ms)

Connection Details​

Getting the password​

Connecting to ClickHouse​

Port-forward (recommended for development)​

Using clickhouse-client (CLI)​

Using the HTTP interface (curl)​

Using a GUI tool (DBeaver / DataGrip)​

Tables​

llm_usage (raw events)​

llm_usage_daily (aggregated rollups)​

Supporting Services​

pricing_refresh River job​

Example Queries​

Total usage for a user this month​

Usage breakdown by model​

Daily usage trend (last 30 days)​

Most active agents by cost​

Hourly request rate (spot traffic spikes)​

Check if data is flowing (health check)​

Secrets Reference​

Troubleshooting​

"No data in ClickHouse"​

"Costs are showing as 0"​

"Usage counters seem wrong"​