Перейти к основному содержимому

Billing, Usage & Cost Tracking

A step-by-step guide to understanding how Crawbl tracks token usage, calculates costs, enforces quotas, and stores analytics data. Written for engineers who have never seen the billing pipeline before.


Chapter 1: The Big Picture

What this system does

Every time a user sends a message to an AI agent, the agent calls an LLM (like GPT-4 or Claude). That LLM call consumes tokens -- small units of text that the model reads and writes. Tokens cost money, so Crawbl tracks every single one.

The billing system answers three questions:

  1. How many tokens did this user consume? (usage tracking)
  2. How much did it cost? (cost calculation)
  3. Has this user exceeded their plan limit? (quota enforcement)
к сведению

Think of it like a phone plan. You have a monthly allowance (tokens instead of minutes). The platform tracks every call (LLM request), calculates the cost, and warns you when you are approaching your limit.


The complete data flow

Here is the path every usage event takes, from the moment an agent calls an LLM to the moment it lands in analytics:

Usage Data Flow
Click diagram to zoom
подсказка

There are two storage paths running in parallel:

  • Postgres -- real-time counters for quota enforcement and API responses (fast writes, small data)
  • ClickHouse -- detailed analytics for dashboards and historical queries (batch writes, big data)

Chapter 2: Step-by-Step Walkthrough

Step 1: The agent runtime emits a UsageEvent

When an agent in a workspace pod calls an LLM, the Google GenAI SDK returns a UsageMetadata object containing token counts. The agent runtime translates this into a gRPC UsageEvent and sends it back to the orchestrator over the bidirectional stream.

Where this happens: crawbl-backend/internal/agentruntime/server/converse.go

The UsageEvent proto message contains:

FieldTypeExampleDescription
agent_idstring"wally"Which agent made the LLM call
modelstring"gpt-4o"Which LLM model was used
prompt_tokensint321250Tokens in the input (user message + context)
completion_tokensint32340Tokens in the output (agent's response)
total_tokensint321590Sum of prompt + completion
cached_tokensint32800Tokens served from cache (cheaper)
thoughts_tokensint320Tokens used for extended thinking
tool_use_prompt_tokensint32200Tokens consumed by tool call prompts
call_sequenceint320Which LLM call in this turn (0-indexed)
к сведению

A single user message can trigger multiple UsageEvents. If the agent uses tools, each tool-use round-trip is a separate LLM call, each emitting its own UsageEvent. The call_sequence field tracks the order.


Step 2: The orchestrator processes the event

The orchestrator's chat service receives the UsageEvent in stream.go and does the following steps in sequence:

Where this happens: crawbl-backend/internal/orchestrator/service/chatservice/stream.go (the handleUsage method)

2a. Real-time update to mobile

The orchestrator immediately emits a Socket.IO event so the mobile app can show live token usage as the agent responds:

Socket.IO event: "usage_update"
Payload: { agent_id, conversation_id, message_id, model, prompt_tokens, completion_tokens, total_tokens, call_sequence }
к сведению

The Socket.IO payload includes message_id (tied to the agent's placeholder message). The River job enqueued for ClickHouse does not include message_id -- it uses conversation_id and session_id for correlation instead.

2b. Cost calculation

The orchestrator looks up the model's per-token pricing from an in-memory cache and computes the USD cost:

cost = (prompt_tokens * input_cost_per_token)
+ (completion_tokens * output_cost_per_token)
+ (cached_tokens * cached_cost_per_token)
примечание

prompt_tokens is the full input token count reported by the LLM provider. cached_tokens is the sub-portion of those input tokens that were served from the provider's prompt cache (eligible for a lower rate). These are separate counts -- cached_tokens is not added on top of prompt_tokens, it overlaps with them but is billed at the cheaper cached_cost_per_token rate.

Where the cache lives: crawbl-backend/internal/pkg/pricing/cache.go

The cache loads pricing from the Postgres model_pricing table on startup and refreshes periodically. It can infer the provider from model name prefixes (e.g., gpt-4o is OpenAI, claude-3 is Anthropic).

2c. Increment user's monthly counter

The orchestrator atomically increments the user's usage_counters row for the current billing period (format: YYYY-MM). Uses INSERT ON CONFLICT so the first request of the month auto-creates the row.

2d. Increment agent's lifetime counter

Same pattern, but for the agent_usage_counters table -- lifetime totals per agent, not monthly.

2e. Enqueue a River job for ClickHouse write

The orchestrator calls usagepublisher.Publish(), which enqueues a River job on the usage_write queue. The in-process usagewriter.Worker picks it up and inserts a single row into ClickHouse's llm_usage table. ClickHouse's async_insert setting handles server-side batching so individual row inserts do not block and are coalesced efficiently.

Where this happens: internal/orchestrator/service/usagepublisher/publisher.go (enqueue) and internal/orchestrator/queue/orchestrator_workers.go (worker handler)

к сведению

Because River jobs are persisted in Postgres before the worker runs them, usage events survive orchestrator restarts. At-least-once delivery semantics apply — ClickHouse's event_id (UUID) can deduplicate if needed.


Step 3: ClickHouse aggregates automatically

ClickHouse stores raw events and auto-aggregates daily rollups:

  • llm_usage -- Every individual LLM call, partitioned by month, kept for 13 months
  • llm_usage_daily -- Auto-populated by a materialized view that sums tokens and costs per day/user/workspace/model/agent
  • llm_usage_daily_mv -- The materialized view that triggers on insert to llm_usage and populates llm_usage_daily

Chapter 3: The Database Tables

Postgres tables (real-time)

model_pricing

Stores per-token costs for every LLM model. One row per (provider, model, region, effective_at). Old rows are preserved for historical cost computation -- new rows are inserted when prices change.

ColumnTypeExampleDescription
idUUIDa1b2c3...Primary key
providerTEXT"openai"LLM provider
modelTEXT"gpt-4o"Model identifier
regionTEXT"global"Pricing region
input_cost_per_tokenNUMERIC(18,12)0.000005000000Cost per input token in USD
output_cost_per_tokenNUMERIC(18,12)0.000015000000Cost per output token in USD
cached_cost_per_tokenNUMERIC(18,12)0.000002500000Cost per cached token in USD
sourceTEXT"manual"Where the price came from
effective_atTIMESTAMPTZ2026-04-01T00:00:00ZWhen this price takes effect
к сведению

The pricing_refresh River job runs daily inside the orchestrator process and updates this Postgres table with current per-token prices from LiteLLM. See the River reference for job details.


usage_plans

Defines the available plans and their token limits. Seeded from migrations/orchestrator/seed/usage_plans.json on startup.

ColumnTypeDescription
plan_idTEXT (PK)Plan identifier ("free", "pro")
nameTEXTDisplay name ("Freemium", "Pro")
monthly_token_limitBIGINTMax tokens per month
daily_request_limitINT (nullable)Max requests per day (currently unused)
max_tokens_per_requestINT (nullable)Max tokens per single request (currently unused)

Current plans:

PlanMonthly Token Limit
Freemium (free)500,000 tokens
Pro (pro)10,000,000 tokens

usage_quotas

Maps each user to a plan. A user can have multiple quota rows (e.g., one expired, one active). The active quota is the one where expires_at IS NULL or expires_at > NOW().

ColumnTypeDescription
user_idUUIDReferences users(id)
plan_idTEXTReferences usage_plans(plan_id)
effective_atTIMESTAMPTZWhen this assignment started
expires_atTIMESTAMPTZNULL = active, non-NULL = expired

usage_counters

Running monthly totals per user. One row per (user_id, period). The period column is a string like "2026-04".

ColumnTypeDescription
user_idUUIDThe user
periodTEXTBilling period (YYYY-MM)
tokens_usedBIGINTTotal tokens consumed this month
prompt_tokens_usedBIGINTInput tokens
completion_tokens_usedBIGINTOutput tokens
cost_usdNUMERIC(12,6)Accumulated cost in USD
request_countINTNumber of LLM calls
last_updated_atTIMESTAMPTZWhen counters were last incremented

agent_usage_counters

Lifetime totals per agent. One row per agent_id.

ColumnTypeDescription
agent_idUUIDThe agent
workspace_idUUIDThe workspace the agent belongs to
tokens_usedBIGINTTotal tokens ever consumed by this agent
prompt_tokens_usedBIGINTLifetime input tokens
completion_tokens_usedBIGINTLifetime output tokens
cost_usdNUMERIC(12,6)Accumulated cost in USD
request_countINTNumber of LLM calls
last_updated_atTIMESTAMPTZWhen counters were last incremented

ClickHouse tables (analytics)

See the ClickHouse reference page for full column definitions and example queries.


Chapter 4: Quota Enforcement

How it works

When the orchestrator needs to check whether a user can make another LLM call, it calls CheckQuota():

SELECT tokens_used, monthly_token_limit
FROM usage_quotas
JOIN usage_plans ON plan_id
LEFT JOIN usage_counters ON user_id AND period
WHERE user_id = ? AND active

This returns (tokensUsed, monthlyLimit). The caller compares:

  • If monthlyLimit == 0 -- no quota row exists for this user (unassigned state, not "free plan with zero usage"), allow the request
  • If tokensUsed < monthlyLimit -- under limit, allow
  • If tokensUsed >= monthlyLimit -- over limit, deny
к сведению

Quota checks happen at the orchestrator level, not in the agent runtime. The runtime does not know about billing -- it just reports token counts.


Example: A user on the free plan

  1. User signs up -- gets assigned plan_id: "free" in usage_quotas
  2. Free plan has monthly_token_limit: 500,000
  3. User sends messages all month, accumulating 487,000 tokens
  4. Next message uses 15,000 tokens -- counter reaches 502,000
  5. The counter still increments (we record actual usage), but quota check returns over-limit
  6. On the next request, the orchestrator sees 502,000 >= 500,000 and can block

Chapter 5: API Endpoints

GET /v1/users/usage/summary

Returns the authenticated user's usage for the current billing period.

Response:

{
"current_period": "2026-04",
"tokens_used": 487000,
"prompt_tokens_used": 312000,
"completion_tokens_used": 175000,
"request_count": 156,
"token_limit": 500000,
"plan_id": "free"
}
warning

Cost data (cost_usd) is tracked internally but not exposed to the frontend. The API response deliberately omits cost fields.

GET /v1/workspaces/:id/usage

Returns token usage for a specific workspace for the current billing period. Verifies the workspace belongs to the authenticated user.

Response:

{
"period": "2026-04",
"tokens_used": 487000,
"prompt_tokens_used": 312000,
"completion_tokens_used": 175000,
"request_count": 156,
"token_limit": 500000
}

Chapter 6: Key Source Files

FileWhat it does
proto/agentruntime/v1/runtime.protoDefines the UsageEvent message
internal/agentruntime/server/converse.goRuntime emits UsageEvent from LLM response
internal/orchestrator/service/chatservice/stream.goOrchestrator processes usage events
internal/orchestrator/service/usagepublisher/publisher.goEnqueues River usage_write jobs
internal/orchestrator/queue/orchestrator_workers.goUsageWriter.Work — River worker that inserts into ClickHouse
internal/pkg/pricing/cache.goIn-memory pricing cache with cost computation
internal/orchestrator/repo/usagerepo/usagerepo.goPostgres queries for usage counters and quotas
internal/orchestrator/server/handler/usage.goHTTP handlers for usage API endpoints
internal/orchestrator/server/dto/usage.goRequest/response types for usage endpoints
migrations/orchestrator/000006_token_usage.up.sqlPostgres table definitions
migrations/clickhouse/001_llm_usage.sqlClickHouse table + materialized view definitions
migrations/orchestrator/seed/usage_plans.jsonPlan seed data (free, pro)

Chapter 7: Real-World Scenarios

"I want to see how many tokens a user consumed this month"

Query the usage_counters table in Postgres:

SELECT tokens_used, prompt_tokens_used, completion_tokens_used, cost_usd, request_count
FROM orchestrator.usage_counters
WHERE user_id = '<user-uuid>' AND period = '2026-04';

"I want to see per-model usage breakdown for a workspace"

Query ClickHouse (port-forward first -- see ClickHouse reference):

SELECT
model,
sum(total_tokens) AS total_tokens,
sum(cost_usd) AS total_cost,
count() AS requests
FROM llm_usage
WHERE workspace_id = '<workspace-uuid>'
AND event_time >= '2026-04-01'
GROUP BY model
ORDER BY total_cost DESC;

"I want to check which plan a user is on"

SELECT uq.plan_id, up.name, up.monthly_token_limit
FROM orchestrator.usage_quotas uq
JOIN orchestrator.usage_plans up ON up.plan_id = uq.plan_id
WHERE uq.user_id = '<user-uuid>'
AND (uq.expires_at IS NULL OR uq.expires_at > NOW())
ORDER BY uq.effective_at DESC
LIMIT 1;

Query the ClickHouse daily rollup:

SELECT
day,
sum(total_tokens) AS tokens,
sum(total_cost_usd) AS cost_usd,
sum(request_count) AS requests
FROM llm_usage_daily
WHERE workspace_id = '<workspace-uuid>'
AND day >= today() - 30
GROUP BY day
ORDER BY day;

What's next: ClickHouse Reference for connection details and example queries