Billing, Usage & Cost Tracking
A step-by-step guide to understanding how Crawbl tracks token usage, calculates costs, enforces quotas, and stores analytics data. Written for engineers who have never seen the billing pipeline before.
Chapter 1: The Big Picture
What this system does
Every time a user sends a message to an AI agent, the agent calls an LLM (like GPT-4 or Claude). That LLM call consumes tokens -- small units of text that the model reads and writes. Tokens cost money, so Crawbl tracks every single one.
The billing system answers three questions:
- How many tokens did this user consume? (usage tracking)
- How much did it cost? (cost calculation)
- Has this user exceeded their plan limit? (quota enforcement)
Think of it like a phone plan. You have a monthly allowance (tokens instead of minutes). The platform tracks every call (LLM request), calculates the cost, and warns you when you are approaching your limit.
The complete data flow
Here is the path every usage event takes, from the moment an agent calls an LLM to the moment it lands in analytics:
There are two storage paths running in parallel:
- Postgres -- real-time counters for quota enforcement and API responses (fast writes, small data)
- ClickHouse -- detailed analytics for dashboards and historical queries (batch writes, big data)
Chapter 2: Step-by-Step Walkthrough
Step 1: The agent runtime emits a UsageEvent
When an agent in a workspace pod calls an LLM, the Google GenAI SDK returns a UsageMetadata object containing token counts. The agent runtime translates this into a gRPC UsageEvent and sends it back to the orchestrator over the bidirectional stream.
Where this happens: crawbl-backend/internal/agentruntime/server/converse.go
The UsageEvent proto message contains:
| Field | Type | Example | Description |
|---|---|---|---|
agent_id | string | "wally" | Which agent made the LLM call |
model | string | "gpt-4o" | Which LLM model was used |
prompt_tokens | int32 | 1250 | Tokens in the input (user message + context) |
completion_tokens | int32 | 340 | Tokens in the output (agent's response) |
total_tokens | int32 | 1590 | Sum of prompt + completion |
cached_tokens | int32 | 800 | Tokens served from cache (cheaper) |
thoughts_tokens | int32 | 0 | Tokens used for extended thinking |
tool_use_prompt_tokens | int32 | 200 | Tokens consumed by tool call prompts |
call_sequence | int32 | 0 | Which LLM call in this turn (0-indexed) |
A single user message can trigger multiple UsageEvents. If the agent uses tools, each tool-use round-trip is a separate LLM call, each emitting its own UsageEvent. The call_sequence field tracks the order.
Step 2: The orchestrator processes the event
The orchestrator's chat service receives the UsageEvent in stream.go and does the following steps in sequence:
Where this happens: crawbl-backend/internal/orchestrator/service/chatservice/stream.go (the handleUsage method)
2a. Real-time update to mobile
The orchestrator immediately emits a Socket.IO event so the mobile app can show live token usage as the agent responds:
Socket.IO event: "usage_update"
Payload: { agent_id, conversation_id, message_id, model, prompt_tokens, completion_tokens, total_tokens, call_sequence }
The Socket.IO payload includes message_id (tied to the agent's placeholder message). The River job enqueued for ClickHouse does not include message_id -- it uses conversation_id and session_id for correlation instead.
2b. Cost calculation
The orchestrator looks up the model's per-token pricing from an in-memory cache and computes the USD cost:
cost = (prompt_tokens * input_cost_per_token)
+ (completion_tokens * output_cost_per_token)
+ (cached_tokens * cached_cost_per_token)
prompt_tokens is the full input token count reported by the LLM provider. cached_tokens is the sub-portion of those input tokens that were served from the provider's prompt cache (eligible for a lower rate). These are separate counts -- cached_tokens is not added on top of prompt_tokens, it overlaps with them but is billed at the cheaper cached_cost_per_token rate.
Where the cache lives: crawbl-backend/internal/pkg/pricing/cache.go
The cache loads pricing from the Postgres model_pricing table on startup and refreshes periodically. It can infer the provider from model name prefixes (e.g., gpt-4o is OpenAI, claude-3 is Anthropic).
2c. Increment user's monthly counter
The orchestrator atomically increments the user's usage_counters row for the current billing period (format: YYYY-MM). Uses INSERT ON CONFLICT so the first request of the month auto-creates the row.
2d. Increment agent's lifetime counter
Same pattern, but for the agent_usage_counters table -- lifetime totals per agent, not monthly.
2e. Enqueue a River job for ClickHouse write
The orchestrator calls usagepublisher.Publish(), which enqueues a River job on the usage_write queue. The in-process usagewriter.Worker picks it up and inserts a single row into ClickHouse's llm_usage table. ClickHouse's async_insert setting handles server-side batching so individual row inserts do not block and are coalesced efficiently.
Where this happens: internal/orchestrator/service/usagepublisher/publisher.go (enqueue) and internal/orchestrator/queue/orchestrator_workers.go (worker handler)
Because River jobs are persisted in Postgres before the worker runs them, usage events survive orchestrator restarts. At-least-once delivery semantics apply — ClickHouse's event_id (UUID) can deduplicate if needed.
Step 3: ClickHouse aggregates automatically
ClickHouse stores raw events and auto-aggregates daily rollups:
llm_usage-- Every individual LLM call, partitioned by month, kept for 13 monthsllm_usage_daily-- Auto-populated by a materialized view that sums tokens and costs per day/user/workspace/model/agentllm_usage_daily_mv-- The materialized view that triggers on insert tollm_usageand populatesllm_usage_daily
Chapter 3: The Database Tables
Postgres tables (real-time)
model_pricing
Stores per-token costs for every LLM model. One row per (provider, model, region, effective_at). Old rows are preserved for historical cost computation -- new rows are inserted when prices change.
| Column | Type | Example | Description |
|---|---|---|---|
id | UUID | a1b2c3... | Primary key |
provider | TEXT | "openai" | LLM provider |
model | TEXT | "gpt-4o" | Model identifier |
region | TEXT | "global" | Pricing region |
input_cost_per_token | NUMERIC(18,12) | 0.000005000000 | Cost per input token in USD |
output_cost_per_token | NUMERIC(18,12) | 0.000015000000 | Cost per output token in USD |
cached_cost_per_token | NUMERIC(18,12) | 0.000002500000 | Cost per cached token in USD |
source | TEXT | "manual" | Where the price came from |
effective_at | TIMESTAMPTZ | 2026-04-01T00:00:00Z | When this price takes effect |
The pricing_refresh River job runs daily inside the orchestrator process and updates this Postgres table with current per-token prices from LiteLLM. See the River reference for job details.
usage_plans
Defines the available plans and their token limits. Seeded from migrations/orchestrator/seed/usage_plans.json on startup.
| Column | Type | Description |
|---|---|---|
plan_id | TEXT (PK) | Plan identifier ("free", "pro") |
name | TEXT | Display name ("Freemium", "Pro") |
monthly_token_limit | BIGINT | Max tokens per month |
daily_request_limit | INT (nullable) | Max requests per day (currently unused) |
max_tokens_per_request | INT (nullable) | Max tokens per single request (currently unused) |
Current plans:
| Plan | Monthly Token Limit |
|---|---|
Freemium (free) | 500,000 tokens |
Pro (pro) | 10,000,000 tokens |
usage_quotas
Maps each user to a plan. A user can have multiple quota rows (e.g., one expired, one active). The active quota is the one where expires_at IS NULL or expires_at > NOW().
| Column | Type | Description |
|---|---|---|
user_id | UUID | References users(id) |
plan_id | TEXT | References usage_plans(plan_id) |
effective_at | TIMESTAMPTZ | When this assignment started |
expires_at | TIMESTAMPTZ | NULL = active, non-NULL = expired |
usage_counters
Running monthly totals per user. One row per (user_id, period). The period column is a string like "2026-04".
| Column | Type | Description |
|---|---|---|
user_id | UUID | The user |
period | TEXT | Billing period (YYYY-MM) |
tokens_used | BIGINT | Total tokens consumed this month |
prompt_tokens_used | BIGINT | Input tokens |
completion_tokens_used | BIGINT | Output tokens |
cost_usd | NUMERIC(12,6) | Accumulated cost in USD |
request_count | INT | Number of LLM calls |
last_updated_at | TIMESTAMPTZ | When counters were last incremented |
agent_usage_counters
Lifetime totals per agent. One row per agent_id.
| Column | Type | Description |
|---|---|---|
agent_id | UUID | The agent |
workspace_id | UUID | The workspace the agent belongs to |
tokens_used | BIGINT | Total tokens ever consumed by this agent |
prompt_tokens_used | BIGINT | Lifetime input tokens |
completion_tokens_used | BIGINT | Lifetime output tokens |
cost_usd | NUMERIC(12,6) | Accumulated cost in USD |
request_count | INT | Number of LLM calls |
last_updated_at | TIMESTAMPTZ | When counters were last incremented |
ClickHouse tables (analytics)
See the ClickHouse reference page for full column definitions and example queries.
Chapter 4: Quota Enforcement
How it works
When the orchestrator needs to check whether a user can make another LLM call, it calls CheckQuota():
SELECT tokens_used, monthly_token_limit
FROM usage_quotas
JOIN usage_plans ON plan_id
LEFT JOIN usage_counters ON user_id AND period
WHERE user_id = ? AND active
This returns (tokensUsed, monthlyLimit). The caller compares:
- If
monthlyLimit == 0-- no quota row exists for this user (unassigned state, not "free plan with zero usage"), allow the request - If
tokensUsed < monthlyLimit-- under limit, allow - If
tokensUsed >= monthlyLimit-- over limit, deny
Quota checks happen at the orchestrator level, not in the agent runtime. The runtime does not know about billing -- it just reports token counts.
Example: A user on the free plan
- User signs up -- gets assigned
plan_id: "free"inusage_quotas - Free plan has
monthly_token_limit: 500,000 - User sends messages all month, accumulating 487,000 tokens
- Next message uses 15,000 tokens -- counter reaches 502,000
- The counter still increments (we record actual usage), but quota check returns over-limit
- On the next request, the orchestrator sees
502,000 >= 500,000and can block
Chapter 5: API Endpoints
GET /v1/users/usage/summary
Returns the authenticated user's usage for the current billing period.
Response:
{
"current_period": "2026-04",
"tokens_used": 487000,
"prompt_tokens_used": 312000,
"completion_tokens_used": 175000,
"request_count": 156,
"token_limit": 500000,
"plan_id": "free"
}
Cost data (cost_usd) is tracked internally but not exposed to the frontend. The API response deliberately omits cost fields.
GET /v1/workspaces/:id/usage
Returns token usage for a specific workspace for the current billing period. Verifies the workspace belongs to the authenticated user.
Response:
{
"period": "2026-04",
"tokens_used": 487000,
"prompt_tokens_used": 312000,
"completion_tokens_used": 175000,
"request_count": 156,
"token_limit": 500000
}
Chapter 6: Key Source Files
| File | What it does |
|---|---|
proto/agentruntime/v1/runtime.proto | Defines the UsageEvent message |
internal/agentruntime/server/converse.go | Runtime emits UsageEvent from LLM response |
internal/orchestrator/service/chatservice/stream.go | Orchestrator processes usage events |
internal/orchestrator/service/usagepublisher/publisher.go | Enqueues River usage_write jobs |
internal/orchestrator/queue/orchestrator_workers.go | UsageWriter.Work — River worker that inserts into ClickHouse |
internal/pkg/pricing/cache.go | In-memory pricing cache with cost computation |
internal/orchestrator/repo/usagerepo/usagerepo.go | Postgres queries for usage counters and quotas |
internal/orchestrator/server/handler/usage.go | HTTP handlers for usage API endpoints |
internal/orchestrator/server/dto/usage.go | Request/response types for usage endpoints |
migrations/orchestrator/000006_token_usage.up.sql | Postgres table definitions |
migrations/clickhouse/001_llm_usage.sql | ClickHouse table + materialized view definitions |
migrations/orchestrator/seed/usage_plans.json | Plan seed data (free, pro) |
Chapter 7: Real-World Scenarios
"I want to see how many tokens a user consumed this month"
Query the usage_counters table in Postgres:
SELECT tokens_used, prompt_tokens_used, completion_tokens_used, cost_usd, request_count
FROM orchestrator.usage_counters
WHERE user_id = '<user-uuid>' AND period = '2026-04';
"I want to see per-model usage breakdown for a workspace"
Query ClickHouse (port-forward first -- see ClickHouse reference):
SELECT
model,
sum(total_tokens) AS total_tokens,
sum(cost_usd) AS total_cost,
count() AS requests
FROM llm_usage
WHERE workspace_id = '<workspace-uuid>'
AND event_time >= '2026-04-01'
GROUP BY model
ORDER BY total_cost DESC;
"I want to check which plan a user is on"
SELECT uq.plan_id, up.name, up.monthly_token_limit
FROM orchestrator.usage_quotas uq
JOIN orchestrator.usage_plans up ON up.plan_id = uq.plan_id
WHERE uq.user_id = '<user-uuid>'
AND (uq.expires_at IS NULL OR uq.expires_at > NOW())
ORDER BY uq.effective_at DESC
LIMIT 1;
"I want to see daily usage trends for a workspace"
Query the ClickHouse daily rollup:
SELECT
day,
sum(total_tokens) AS tokens,
sum(total_cost_usd) AS cost_usd,
sum(request_count) AS requests
FROM llm_usage_daily
WHERE workspace_id = '<workspace-uuid>'
AND day >= today() - 30
GROUP BY day
ORDER BY day;
What's next: ClickHouse Reference for connection details and example queries