Billing, Usage & Cost Tracking

A step-by-step guide to understanding how Crawbl tracks token usage, calculates costs, enforces quotas, and stores analytics data. Written for engineers who have never seen the billing pipeline before.

Chapter 1: The Big Picture

What this system does

Every time a user sends a message to an AI agent, the agent calls an LLM (like GPT-4 or Claude). That LLM call consumes tokens -- small units of text that the model reads and writes. Tokens cost money, so Crawbl tracks every single one.

The billing system answers three questions:

How many tokens did this user consume? (usage tracking)
How much did it cost? (cost calculation)
Has this user exceeded their plan limit? (quota enforcement)

к сведению

Think of it like a phone plan. You have a monthly allowance (tokens instead of minutes). The platform tracks every call (LLM request), calculates the cost, and warns you when you are approaching your limit.

The complete data flow

Here is the path every usage event takes, from the moment an agent calls an LLM to the moment it lands in analytics:

Click diagram to zoom

подсказка

There are two storage paths running in parallel:

Postgres -- real-time counters for quota enforcement and API responses (fast writes, small data)
ClickHouse -- detailed analytics for dashboards and historical queries (batch writes, big data)

Chapter 2: Step-by-Step Walkthrough

Step 1: The agent runtime emits a UsageEvent

When an agent in a workspace pod calls an LLM, the Google GenAI SDK returns a UsageMetadata object containing token counts. The agent runtime translates this into a gRPC UsageEvent and sends it back to the orchestrator over the bidirectional stream.

Where this happens: crawbl-backend/internal/agentruntime/server/converse.go

The UsageEvent proto message contains:

Field	Type	Example	Description
`agent_id`	string	`"wally"`	Which agent made the LLM call
`model`	string	`"gpt-4o"`	Which LLM model was used
`prompt_tokens`	int32	`1250`	Tokens in the input (user message + context)
`completion_tokens`	int32	`340`	Tokens in the output (agent's response)
`total_tokens`	int32	`1590`	Sum of prompt + completion
`cached_tokens`	int32	`800`	Tokens served from cache (cheaper)
`thoughts_tokens`	int32	`0`	Tokens used for extended thinking
`tool_use_prompt_tokens`	int32	`200`	Tokens consumed by tool call prompts
`call_sequence`	int32	`0`	Which LLM call in this turn (0-indexed)

к сведению

A single user message can trigger multiple UsageEvents. If the agent uses tools, each tool-use round-trip is a separate LLM call, each emitting its own UsageEvent. The call_sequence field tracks the order.

Step 2: The orchestrator processes the event

The orchestrator's chat service receives the UsageEvent in stream.go and does the following steps in sequence:

Where this happens: crawbl-backend/internal/orchestrator/service/chatservice/stream.go (the handleUsage method)

2a. Real-time update to mobile

The orchestrator immediately emits a Socket.IO event so the mobile app can show live token usage as the agent responds:

Socket.IO event: "usage_update"
Payload: { agent_id, conversation_id, message_id, model, prompt_tokens, completion_tokens, total_tokens, call_sequence }

к сведению

The Socket.IO payload includes message_id (tied to the agent's placeholder message). The River job enqueued for ClickHouse does not include message_id -- it uses conversation_id and session_id for correlation instead.

2b. Cost calculation

The orchestrator looks up the model's per-token pricing from an in-memory cache and computes the USD cost:

cost = (prompt_tokens * input_cost_per_token)
     + (completion_tokens * output_cost_per_token)
     + (cached_tokens * cached_cost_per_token)

примечание

prompt_tokens is the full input token count reported by the LLM provider. cached_tokens is the sub-portion of those input tokens that were served from the provider's prompt cache (eligible for a lower rate). These are separate counts -- cached_tokens is not added on top of prompt_tokens, it overlaps with them but is billed at the cheaper cached_cost_per_token rate.

Where the cache lives: crawbl-backend/internal/pkg/pricing/cache.go

The cache loads pricing from the Postgres model_pricing table on startup and refreshes periodically. It can infer the provider from model name prefixes (e.g., gpt-4o is OpenAI, claude-3 is Anthropic).

2c. Increment user's monthly counter

The orchestrator atomically increments the user's usage_counters row for the current billing period (format: YYYY-MM). Uses INSERT ON CONFLICT so the first request of the month auto-creates the row.

2d. Increment agent's lifetime counter

Same pattern, but for the agent_usage_counters table -- lifetime totals per agent, not monthly.

2e. Enqueue a River job for ClickHouse write

The orchestrator calls usagepublisher.Publish(), which enqueues a River job on the usage_write queue. The in-process usagewriter.Worker picks it up and inserts a single row into ClickHouse's llm_usage table. ClickHouse's async_insert setting handles server-side batching so individual row inserts do not block and are coalesced efficiently.

Where this happens: internal/orchestrator/service/usagepublisher/publisher.go (enqueue) and internal/orchestrator/queue/orchestrator_workers.go (worker handler)

к сведению

Because River jobs are persisted in Postgres before the worker runs them, usage events survive orchestrator restarts. At-least-once delivery semantics apply — ClickHouse's event_id (UUID) can deduplicate if needed.

Step 3: ClickHouse aggregates automatically

ClickHouse stores raw events and auto-aggregates daily rollups:

llm_usage -- Every individual LLM call, partitioned by month, kept for 13 months
llm_usage_daily -- Auto-populated by a materialized view that sums tokens and costs per day/user/workspace/model/agent
llm_usage_daily_mv -- The materialized view that triggers on insert to llm_usage and populates llm_usage_daily

Chapter 3: The Database Tables

Postgres tables (real-time)

model_pricing

Stores per-token costs for every LLM model. One row per (provider, model, region, effective_at). Old rows are preserved for historical cost computation -- new rows are inserted when prices change.

Column	Type	Example	Description
`id`	UUID	`a1b2c3...`	Primary key
`provider`	TEXT	`"openai"`	LLM provider
`model`	TEXT	`"gpt-4o"`	Model identifier
`region`	TEXT	`"global"`	Pricing region
`input_cost_per_token`	NUMERIC(18,12)	`0.000005000000`	Cost per input token in USD
`output_cost_per_token`	NUMERIC(18,12)	`0.000015000000`	Cost per output token in USD
`cached_cost_per_token`	NUMERIC(18,12)	`0.000002500000`	Cost per cached token in USD
`source`	TEXT	`"manual"`	Where the price came from
`effective_at`	TIMESTAMPTZ	`2026-04-01T00:00:00Z`	When this price takes effect

к сведению

The pricing_refresh River job runs daily inside the orchestrator process and updates this Postgres table with current per-token prices from LiteLLM. See the River reference for job details.

usage_plans

Defines the available plans and their token limits. Seeded from migrations/orchestrator/seed/usage_plans.json on startup.

Column	Type	Description
`plan_id`	TEXT (PK)	Plan identifier (`"free"`, `"pro"`)
`name`	TEXT	Display name (`"Freemium"`, `"Pro"`)
`monthly_token_limit`	BIGINT	Max tokens per month
`daily_request_limit`	INT (nullable)	Max requests per day (currently unused)
`max_tokens_per_request`	INT (nullable)	Max tokens per single request (currently unused)

Current plans:

Plan	Monthly Token Limit
Freemium (`free`)	500,000 tokens
Pro (`pro`)	10,000,000 tokens

usage_quotas

Maps each user to a plan. A user can have multiple quota rows (e.g., one expired, one active). The active quota is the one where expires_at IS NULL or expires_at > NOW().

Column	Type	Description
`user_id`	UUID	References `users(id)`
`plan_id`	TEXT	References `usage_plans(plan_id)`
`effective_at`	TIMESTAMPTZ	When this assignment started
`expires_at`	TIMESTAMPTZ	NULL = active, non-NULL = expired

usage_counters

Running monthly totals per user. One row per (user_id, period). The period column is a string like "2026-04".

Column	Type	Description
`user_id`	UUID	The user
`period`	TEXT	Billing period (YYYY-MM)
`tokens_used`	BIGINT	Total tokens consumed this month
`prompt_tokens_used`	BIGINT	Input tokens
`completion_tokens_used`	BIGINT	Output tokens
`cost_usd`	NUMERIC(12,6)	Accumulated cost in USD
`request_count`	INT	Number of LLM calls
`last_updated_at`	TIMESTAMPTZ	When counters were last incremented

agent_usage_counters

Lifetime totals per agent. One row per agent_id.

Column	Type	Description
`agent_id`	UUID	The agent
`workspace_id`	UUID	The workspace the agent belongs to
`tokens_used`	BIGINT	Total tokens ever consumed by this agent
`prompt_tokens_used`	BIGINT	Lifetime input tokens
`completion_tokens_used`	BIGINT	Lifetime output tokens
`cost_usd`	NUMERIC(12,6)	Accumulated cost in USD
`request_count`	INT	Number of LLM calls
`last_updated_at`	TIMESTAMPTZ	When counters were last incremented

ClickHouse tables (analytics)

See the ClickHouse reference page for full column definitions and example queries.

Chapter 4: Quota Enforcement

How it works

When the orchestrator needs to check whether a user can make another LLM call, it calls CheckQuota():

SELECT tokens_used, monthly_token_limit
FROM usage_quotas
JOIN usage_plans ON plan_id
LEFT JOIN usage_counters ON user_id AND period
WHERE user_id = ? AND active

This returns (tokensUsed, monthlyLimit). The caller compares:

If monthlyLimit == 0 -- no quota row exists for this user (unassigned state, not "free plan with zero usage"), allow the request
If tokensUsed < monthlyLimit -- under limit, allow
If tokensUsed >= monthlyLimit -- over limit, deny

к сведению

Quota checks happen at the orchestrator level, not in the agent runtime. The runtime does not know about billing -- it just reports token counts.

Example: A user on the free plan

User signs up -- gets assigned plan_id: "free" in usage_quotas
Free plan has monthly_token_limit: 500,000
User sends messages all month, accumulating 487,000 tokens
Next message uses 15,000 tokens -- counter reaches 502,000
The counter still increments (we record actual usage), but quota check returns over-limit
On the next request, the orchestrator sees 502,000 >= 500,000 and can block

Chapter 5: API Endpoints

`GET /v1/users/usage/summary`

Returns the authenticated user's usage for the current billing period.

Response:

{
  "current_period": "2026-04",
  "tokens_used": 487000,
  "prompt_tokens_used": 312000,
  "completion_tokens_used": 175000,
  "request_count": 156,
  "token_limit": 500000,
  "plan_id": "free"
}

warning

Cost data (cost_usd) is tracked internally but not exposed to the frontend. The API response deliberately omits cost fields.

`GET /v1/workspaces/:id/usage`

Returns token usage for a specific workspace for the current billing period. Verifies the workspace belongs to the authenticated user.

Response:

{
  "period": "2026-04",
  "tokens_used": 487000,
  "prompt_tokens_used": 312000,
  "completion_tokens_used": 175000,
  "request_count": 156,
  "token_limit": 500000
}

Chapter 6: Key Source Files

File	What it does
`proto/agentruntime/v1/runtime.proto`	Defines the `UsageEvent` message
`internal/agentruntime/server/converse.go`	Runtime emits UsageEvent from LLM response
`internal/orchestrator/service/chatservice/stream.go`	Orchestrator processes usage events
`internal/orchestrator/service/usagepublisher/publisher.go`	Enqueues River `usage_write` jobs
`internal/orchestrator/queue/orchestrator_workers.go`	`UsageWriter.Work` — River worker that inserts into ClickHouse
`internal/pkg/pricing/cache.go`	In-memory pricing cache with cost computation
`internal/orchestrator/repo/usagerepo/usagerepo.go`	Postgres queries for usage counters and quotas
`internal/orchestrator/server/handler/usage.go`	HTTP handlers for usage API endpoints
`internal/orchestrator/server/dto/usage.go`	Request/response types for usage endpoints
`migrations/orchestrator/000006_token_usage.up.sql`	Postgres table definitions
`migrations/clickhouse/001_llm_usage.sql`	ClickHouse table + materialized view definitions
`migrations/orchestrator/seed/usage_plans.json`	Plan seed data (free, pro)

Chapter 7: Real-World Scenarios

"I want to see how many tokens a user consumed this month"

Query the usage_counters table in Postgres:

SELECT tokens_used, prompt_tokens_used, completion_tokens_used, cost_usd, request_count
FROM orchestrator.usage_counters
WHERE user_id = '<user-uuid>' AND period = '2026-04';

"I want to see per-model usage breakdown for a workspace"

Query ClickHouse (port-forward first -- see ClickHouse reference):

SELECT
    model,
    sum(total_tokens) AS total_tokens,
    sum(cost_usd) AS total_cost,
    count() AS requests
FROM llm_usage
WHERE workspace_id = '<workspace-uuid>'
  AND event_time >= '2026-04-01'
GROUP BY model
ORDER BY total_cost DESC;

"I want to check which plan a user is on"

SELECT uq.plan_id, up.name, up.monthly_token_limit
FROM orchestrator.usage_quotas uq
JOIN orchestrator.usage_plans up ON up.plan_id = uq.plan_id
WHERE uq.user_id = '<user-uuid>'
  AND (uq.expires_at IS NULL OR uq.expires_at > NOW())
ORDER BY uq.effective_at DESC
LIMIT 1;

"I want to see daily usage trends for a workspace"

Query the ClickHouse daily rollup:

SELECT
    day,
    sum(total_tokens) AS tokens,
    sum(total_cost_usd) AS cost_usd,
    sum(request_count) AS requests
FROM llm_usage_daily
WHERE workspace_id = '<workspace-uuid>'
  AND day >= today() - 30
GROUP BY day
ORDER BY day;

What's next: ClickHouse Reference for connection details and example queries

Chapter 1: The Big Picture​

What this system does​

The complete data flow​

Chapter 2: Step-by-Step Walkthrough​

Step 1: The agent runtime emits a UsageEvent​

Step 2: The orchestrator processes the event​

2a. Real-time update to mobile​

2b. Cost calculation​

2c. Increment user's monthly counter​

2d. Increment agent's lifetime counter​

2e. Enqueue a River job for ClickHouse write​

Step 3: ClickHouse aggregates automatically​

Chapter 3: The Database Tables​

Postgres tables (real-time)​

model_pricing​

usage_plans​

usage_quotas​

usage_counters​

agent_usage_counters​

ClickHouse tables (analytics)​

Chapter 4: Quota Enforcement​

How it works​

Example: A user on the free plan​

Chapter 5: API Endpoints​

GET /v1/users/usage/summary​

GET /v1/workspaces/:id/usage​

Chapter 6: Key Source Files​

Chapter 7: Real-World Scenarios​

"I want to see how many tokens a user consumed this month"​

"I want to see per-model usage breakdown for a workspace"​

"I want to check which plan a user is on"​

"I want to see daily usage trends for a workspace"​

Chapter 1: The Big Picture

What this system does

The complete data flow

Chapter 2: Step-by-Step Walkthrough

Step 1: The agent runtime emits a UsageEvent

Step 2: The orchestrator processes the event

2a. Real-time update to mobile

2b. Cost calculation

2c. Increment user's monthly counter

2d. Increment agent's lifetime counter

2e. Enqueue a River job for ClickHouse write

Step 3: ClickHouse aggregates automatically

Chapter 3: The Database Tables

Postgres tables (real-time)

model_pricing

usage_plans

usage_quotas

usage_counters

agent_usage_counters

ClickHouse tables (analytics)

Chapter 4: Quota Enforcement

How it works

Example: A user on the free plan

Chapter 5: API Endpoints

`GET /v1/users/usage/summary`

`GET /v1/workspaces/:id/usage`

Chapter 6: Key Source Files

Chapter 7: Real-World Scenarios

"I want to see how many tokens a user consumed this month"

"I want to see per-model usage breakdown for a workspace"

"I want to check which plan a user is on"

"I want to see daily usage trends for a workspace"