The Crawbl Logs Guide

A reference for accessing and searching logs in the Crawbl cluster. No prior LogsQL experience required.

Prod Only — Monitoring Stack

VictoriaLogs and Fluent Bit are not deployed in the dev environment. They run in production only.

In dev, use kubectl logs directly:

kubectl logs -n backend deployment/orchestrator -f
kubectl logs -n backend deployment/orchestrator --previous
kubectl logs -n userswarms <pod-name>

The LogsQL query examples in this guide apply to the prod environment.

Chapter 1: Getting Started

What is VictoriaLogs and why we use it

VictoriaLogs is a log database that collects output from every container in the Crawbl cluster automatically (prod only).

No more SSH-ing into machines or running kubectl logs against individual pods
Open a browser, type a query, and search across every service at once

Why VictoriaLogs?

Lightweight -- runs as a single pod
Efficient storage -- compresses logs well
Purpose-built query language -- designed specifically for log data
Replaces heavier alternatives like Elasticsearch or Loki

How logs flow from your app to VictoriaLogs

Every log line takes this path (prod only):

Your app writes to stdout/stderr
        |
        v
containerd (the container runtime) writes it to a file on disk
        |
        v
Fluent Bit (runs on every node) reads the file, attaches Kubernetes
metadata (namespace, pod name, container name), and ships it over HTTP
        |
        v
VictoriaLogs stores the enriched record and makes it searchable

подсказка

This happens automatically for every container in prod. You do not need to configure anything in your application -- just write to stdout.

How to open the UI (prod only)

Open your browser and go to the prod VictoriaLogs URL.

You will see three areas:

Area	Location	Purpose
Query bar	Top center	Type queries here. Press `Enter` to run.
Time range picker	Top right	Controls the search window. Defaults to last 1 hour.
Results pane	Below	Shows matching log lines. Click any line to expand metadata.

Your first query

Type * into the query bar and press Enter.

This matches every log line in the selected time range
You will see logs from the orchestrator, Redis, ArgoCD, Fluent Bit, and everything else mixed together

подсказка

If the results are empty, check the time range picker. Extend it to "Last 24 hours" to confirm logs exist.

Chapter 2: Understanding the Log Structure

Every log line has metadata

Fluent Bit attaches metadata fields from the Kubernetes API to every log line it ships. These fields tell you exactly where the log came from without needing to read the message itself.

The fields you will use

`kubernetes.namespace_name` -- which namespace the pod lives in

Namespace	What lives here
`backend`	Orchestrator, webhook, reaper, PostgreSQL, Redis, pgweb, docs, website
`userswarms`	Agent Runtime pods (one per user workspace)
`monitoring`	Fluent Bit, VictoriaMetrics, VictoriaLogs (prod only — not present in dev)
`argocd`	ArgoCD server, repo server, application controller, Redis
`cert-manager`	Certificate management controllers
`envoy-gateway-system`	Envoy Gateway and proxy pods
`external-dns`	DNS record sync controller
`external-secrets`	Secrets sync from AWS Secrets Manager
`userswarm-controller`	Metacontroller (creates agent pods)

`kubernetes.pod_name` -- the full pod name including the random suffix

Pod name	What it is
`orchestrator-795499fd8b-sgctg`	The Crawbl API server
`userswarm-webhook-7db4d6cdcd-wq6rx`	The webhook that creates agent pods
`e2e-reaper-29587560-k65dv`	CronJob that cleans up test resources
`backend-postgresql-0`	The PostgreSQL database
`backend-redis-master-0`	The Redis instance
`agent-runtime-workspace-81a5f386-c6a6-4c0a-b6a-3353eb37c1-0`	An Agent Runtime pod
`victoria-logs-0`	VictoriaLogs itself

`kubernetes.container_name` -- the container name within the pod

This is often the most useful field for filtering.

Common values: orchestrator, webhook, reaper, agent-runtime, redis, postgresql, fluent-bit, server (ArgoCD), repo-server (ArgoCD), application-controller (ArgoCD), vmsingle (VictoriaMetrics), vlogs (VictoriaLogs), docs, website.

`stream` -- stdout vs stderr

stdout -- normal output
stderr -- error output

warning

Go panics, stack traces, and fatal errors always go to stderr. Filter on stream="stderr" to catch them fast.

How to use `_stream` filters

The _stream:{...} syntax tells VictoriaLogs which logs to look at. Fields inside the braces are ANDed together -- all conditions must match.

_stream only supports exact match

Inside _stream:{...}, field values must use exact match with =. Do not use regex operators like =~ inside _stream:{} -- use kubernetes.container_name (which is stable across pod restarts) instead of trying to regex-match on kubernetes.pod_name. If you need pattern matching, filter outside the stream selector using field:~"pattern".

Filter by namespace:

_stream:{kubernetes.namespace_name="backend"}

What this does: Returns all logs from every pod in the backend namespace.

Filter by namespace + container:

_stream:{kubernetes.namespace_name="backend", kubernetes.container_name="orchestrator"}

What this does: Narrows to only the orchestrator container in the backend namespace.

Available stream fields

The four stream fields are: kubernetes.namespace_name, kubernetes.pod_name, kubernetes.container_name, and stream. You can combine any of them inside _stream:{...}.

Chapter 3: Logs for Every Crawbl Service

This chapter covers every service running in the cluster. For each one, you get the exact query to see its logs and a query for when things go wrong.

Orchestrator

The main backend API -- handles authentication, user management, swarm requests, and all mobile app traffic.

See all logs:

_stream:{kubernetes.namespace_name="backend", kubernetes.container_name="orchestrator"}

Filter errors only:

_stream:{kubernetes.namespace_name="backend", kubernetes.container_name="orchestrator"} level:ERROR OR level:WARN

What to look for

JSON structured logs via Go's slog -- Fluent Bit extracts each field to top level, so level, msg, method, path, request_id are all directly searchable
_msg shows the human-readable message value (e.g. request started), not the raw JSON string
Normal operation shows INFO-level request logs
Filter by any extracted field, e.g.: method:POST level:ERROR for failed POST requests

UserSwarm Webhook

Receives requests from Metacontroller and creates Agent Runtime pods in the userswarms namespace.

See all logs:

_stream:{kubernetes.namespace_name="backend", kubernetes.container_name="webhook"}

Filter errors only:

_stream:{kubernetes.namespace_name="backend", kubernetes.container_name="webhook"} error OR panic OR "exit code"

What to look for

Pod creation events and validation results
Resource allocation decisions
Panic or exit code messages indicate crash failures

E2E Reaper

A CronJob that periodically cleans up resources created by end-to-end tests.

See all logs:

_stream:{kubernetes.namespace_name="backend", kubernetes.container_name="reaper"}

Filter errors only:

_stream:{kubernetes.namespace_name="backend", kubernetes.container_name="reaper"} error OR failed

What to look for

Counts of deleted resources
Successful cleanup confirmations
Failed deletion attempts

к сведению

The reaper runs as a CronJob, so its pod name changes on each run. Filter by container_name="reaper" which stays the same across all runs.

Agent Runtime

AI agent runtimes -- each user workspace gets its own pod in the userswarms namespace.

See all logs:

_stream:{kubernetes.namespace_name="userswarms"}

Filter errors only:

_stream:{kubernetes.namespace_name="userswarms"} error OR "exit code" OR OOMKilled OR CrashLoopBackOff

What to look for

Agent startup and initialization messages
LLM API call results and errors
OOMKilled or CrashLoopBackOff indicate resource issues

Filtering a specific workspace

Use the full pod name for a specific workspace:

_stream:{kubernetes.namespace_name="userswarms", kubernetes.pod_name="agent-runtime-workspace-81a5f386-c6a6-4c0a-b6a-3353eb37c1-0"}

PostgreSQL

The primary data store for the platform.

See all logs:

_stream:{kubernetes.namespace_name="backend", kubernetes.pod_name="backend-postgresql-0"}

Filter errors only:

_stream:{kubernetes.pod_name="backend-postgresql-0"} ERROR OR FATAL OR "deadlock detected" OR "too many connections"

What to look for

Connection counts and slow query warnings
Checkpoint activity
Startup and recovery messages

warning

PostgreSQL uses uppercase ERROR and FATAL in its log output -- these are not the same as lowercase error. Use the exact casing shown above.

Redis

Handles caching and pub/sub messaging for the platform.

See all logs:

_stream:{kubernetes.namespace_name="backend", kubernetes.pod_name="backend-redis-master-0"}

Filter errors only:

_stream:{kubernetes.pod_name="backend-redis-master-0"} error OR "OOM" OR "maxmemory" OR "connection refused"

What to look for

Connection events and client counts
Memory warnings (maxmemory, OOM)
Persistence status (RDB/AOF save results)

примечание

Redis produces very few logs during normal operation. If this query returns empty results, that is expected -- Redis only logs significant events like startup, shutdown, or memory warnings. Use the Metrics Guide to monitor Redis health via redis_up and redis_memory_used_bytes instead.

ArgoCD

Syncs the cluster state to match what is committed in the crawbl-argocd-apps Git repo.

See all logs:

_stream:{kubernetes.namespace_name="argocd"}

See only the server (handles syncs):

_stream:{kubernetes.namespace_name="argocd", kubernetes.container_name="server"}

See the application controller (detects drift):

_stream:{kubernetes.namespace_name="argocd", kubernetes.container_name="application-controller"}

Filter errors only:

_stream:{kubernetes.namespace_name="argocd"} error OR failed OR "sync failed" OR "ComparisonError"

What to look for

Sync status changes and health check results
ComparisonError means manifest generation failed
sync failed usually points to invalid YAML or missing resources

Envoy Gateway

The public entry point for all traffic. Handles TLS termination and routes requests to backend services.

See all logs:

_stream:{kubernetes.namespace_name="envoy-gateway-system"}

Filter errors only:

_stream:{kubernetes.namespace_name="envoy-gateway-system"} error OR "503" OR "upstream connect" OR "no healthy upstream"

What to look for

503 errors mean the upstream service is down
no healthy upstream means Envoy cannot reach the backend pods
upstream connect failures indicate networking issues

Cert-Manager

Automatically provisions and renews TLS certificates from Let's Encrypt using DNS-01 challenges via Cloudflare.

See all logs:

_stream:{kubernetes.namespace_name="cert-manager"}

Filter errors only:

_stream:{kubernetes.namespace_name="cert-manager"} error OR "challenge failed" OR "not ready" OR "acme"

What to look for

Certificate issuance and renewal events
DNS-01 challenge progress
ACME protocol errors or rate limits

External DNS

Automatically creates and updates Cloudflare DNS records to point at the cluster's load balancer.

See all logs:

_stream:{kubernetes.namespace_name="external-dns"}

Filter errors only:

_stream:{kubernetes.namespace_name="external-dns"} error OR "failed" OR "403" OR "rate limit"

What to look for

DNS record create/update events
Cloudflare API errors (403, rate limits)
Sync interval logs

External Secrets

Reads secrets from AWS Secrets Manager and creates matching Kubernetes Secret objects.

See all logs:

_stream:{kubernetes.namespace_name="external-secrets"}

Filter errors only:

_stream:{kubernetes.namespace_name="external-secrets"} error OR "SecretSyncError" OR "AccessDeniedException" OR "not found"

What to look for

Secret sync success/failure events
AccessDeniedException means IAM permissions issue
SecretSyncError means the secret exists but could not be written to Kubernetes

Fluent Bit

Collects logs from every node and ships them to VictoriaLogs.

See all logs:

_stream:{kubernetes.namespace_name="monitoring", kubernetes.container_name="fluent-bit"}

Filter errors only:

_stream:{kubernetes.container_name="fluent-bit"} error OR "retry" OR "chunk" OR "backpressure"

What to look for

Retry counts and chunk errors indicate delivery problems
Backpressure warnings mean VictoriaLogs cannot ingest fast enough
If logs are missing from other services, check Fluent Bit first

осторожно

If Fluent Bit is unhealthy, no logs are being collected from any service. This is the first thing to check when logs seem to be missing.

VictoriaMetrics

Stores cluster and application metrics. Exposes a Prometheus-compatible API.

See all logs:

_stream:{kubernetes.namespace_name="monitoring", kubernetes.container_name="vmsingle"}

Filter errors only:

_stream:{kubernetes.namespace_name="monitoring", kubernetes.container_name="vmsingle"} error OR "out of memory" OR "disk"

What to look for

Ingestion rate and storage usage
Out-of-memory or disk-full warnings
Scrape target errors

VictoriaLogs

The log storage system you are querying right now. Its own logs help diagnose ingestion or storage problems.

See all logs:

_stream:{kubernetes.namespace_name="monitoring", kubernetes.container_name="vlogs"}

Filter errors only:

_stream:{kubernetes.namespace_name="monitoring", kubernetes.container_name="vlogs"} error OR "disk" OR "ingestion"

What to look for

Ingestion errors or slow flushes
Disk space warnings
Query timeout messages

Docs Site

The Docusaurus documentation site served at dev.docs.crawbl.com.

See all logs:

_stream:{kubernetes.namespace_name="backend", kubernetes.container_name="docs"}

Filter errors only:

_stream:{kubernetes.namespace_name="backend", kubernetes.container_name="docs"} error OR "502" OR "upstream"

What to look for

Nginx access and error logs
502 means the upstream Docusaurus process crashed
Static asset 404s

Website

The public-facing crawbl.com marketing site.

See all logs:

_stream:{kubernetes.namespace_name="backend", kubernetes.container_name="website"}

Filter errors only:

_stream:{kubernetes.namespace_name="backend", kubernetes.container_name="website"} error OR "502" OR "upstream"

What to look for

Nginx access and error logs
502 means the upstream process crashed
Static asset 404s

Chapter 4: Common Troubleshooting Scenarios

Each scenario walks you through the exact queries to run, in order.

"The API is returning 500 errors"

Step 1 -- Check the orchestrator for errors:

_stream:{kubernetes.container_name="orchestrator"} level:ERROR

What this does: Shows all ERROR-level log lines from the orchestrator.

Step 2 -- If the error mentions the database, check PostgreSQL:

_stream:{kubernetes.pod_name="backend-postgresql-0"} ERROR OR FATAL

What this does: Shows PostgreSQL errors and fatal messages.

Step 3 -- If the error mentions Redis, check Redis:

_stream:{kubernetes.pod_name="backend-redis-master-0"} error

What this does: Shows all Redis error logs.

Step 4 -- Check if the problem is at the gateway level (request never reaching the orchestrator):

_stream:{kubernetes.namespace_name="envoy-gateway-system"} "503" OR "no healthy upstream"

What this does: Shows gateway-level failures where requests could not be routed.

подсказка

Start at the orchestrator and work outward. Most 500s originate in the API code itself, not infrastructure.

"A user's AI agent isn't starting"

Step 1 -- Check the webhook for pod creation errors:

_stream:{kubernetes.container_name="webhook"} error

What this does: Shows errors during agent pod creation.

Step 2 -- Check if the agent pod exists and is logging:

_stream:{kubernetes.namespace_name="userswarms"}

What this does: Shows all logs from agent pods.

Step 3 -- Look for crash loops or OOM kills in agent pods:

_stream:{kubernetes.namespace_name="userswarms"} "exit code" OR OOMKilled OR error

What this does: Surfaces agent pods that are crashing or running out of memory.

Step 4 -- Check the metacontroller for scheduling issues:

_stream:{kubernetes.namespace_name="userswarm-controller"}

What this does: Shows metacontroller logs to diagnose why a pod was not scheduled.

warning

If Step 2 returns nothing, the pod was never created. Focus on the webhook (Step 1) and metacontroller (Step 4).

"ArgoCD sync is failing"

Step 1 -- Check for sync errors across all ArgoCD components:

_stream:{kubernetes.namespace_name="argocd"} "sync failed" OR error

What this does: Shows all sync failures and errors across ArgoCD.

Step 2 -- Narrow to the repo server (where manifest generation happens):

_stream:{kubernetes.namespace_name="argocd", kubernetes.container_name="repo-server"} error

What this does: Shows errors during Helm/Kustomize rendering.

Step 3 -- Check if a specific app is mentioned:

_stream:{kubernetes.namespace_name="argocd"} "orchestrator" error

What this does: Filters ArgoCD errors related to the orchestrator app.

примечание

Replace "orchestrator" with the name of whatever application is failing.

"TLS certificate isn't renewing"

Step 1 -- Check cert-manager for challenge failures:

_stream:{kubernetes.namespace_name="cert-manager"} error OR "challenge" OR "not ready"

What this does: Shows certificate issuance errors and challenge status.

Step 2 -- Check if the Cloudflare API token is valid:

_stream:{kubernetes.namespace_name="cert-manager"} "403" OR "unauthorized" OR "cloudflare"

What this does: Surfaces authentication failures with the Cloudflare API.

Step 3 -- Verify the external-secrets operator synced the Cloudflare token:

_stream:{kubernetes.namespace_name="external-secrets"} "cloudflare" OR error

What this does: Checks if the secret containing the Cloudflare token was delivered to the cluster.

осторожно

If the Cloudflare token expired or was rotated, all certificate renewals will fail. Update it in AWS Secrets Manager and restart external-secrets.

"DNS records aren't updating"

Step 1 -- Check external-dns for errors:

_stream:{kubernetes.namespace_name="external-dns"} error OR "failed"

What this does: Shows all external-dns errors.

Step 2 -- Look for Cloudflare API rate limits or auth issues:

_stream:{kubernetes.namespace_name="external-dns"} "rate limit" OR "403" OR "unauthorized"

What this does: Surfaces API authentication or throttling problems.

"The database is slow"

Step 1 -- Check PostgreSQL for slow query warnings:

_stream:{kubernetes.pod_name="backend-postgresql-0"} "duration" OR "slow" OR "lock"

What this does: Surfaces slow queries, lock waits, and duration warnings.

Step 2 -- Check if connections are being exhausted:

_stream:{kubernetes.pod_name="backend-postgresql-0"} "too many connections" OR "remaining connection"

What this does: Shows connection pool exhaustion warnings.

Step 3 -- Cross-reference with orchestrator logs to see which requests are slow:

_stream:{kubernetes.container_name="orchestrator"} (level:WARN OR level:ERROR) AND (database OR postgres OR sql)

What this does: Correlates application-level warnings with database issues.

подсказка

Check connection counts first. Most "slow database" issues are actually connection pool exhaustion.

"Redis is not responding"

Step 1 -- Check Redis logs directly:

_stream:{kubernetes.pod_name="backend-redis-master-0"} error OR "OOM"

What this does: Shows Redis errors and out-of-memory events.

Step 2 -- Check the orchestrator for Redis connection errors:

_stream:{kubernetes.container_name="orchestrator"} "redis" OR "connection refused"

What this does: Shows application-side Redis connection failures.

"A new deployment broke something -- what changed?"

к сведению

Set your time range to the 10 minutes around the deployment before running these queries.

Step 1 -- Check for errors across the backend namespace:

_stream:{kubernetes.namespace_name="backend"} error OR panic OR fatal

What this does: Broad sweep for any errors in the backend after deploy.

Step 2 -- Watch the orchestrator's startup sequence (set time range to just after the deploy):

_stream:{kubernetes.container_name="orchestrator"} | sort by (_time) asc

What this does: Shows the orchestrator boot sequence in chronological order.

Step 3 -- Check if ArgoCD had issues during the sync:

_stream:{kubernetes.namespace_name="argocd"} "sync" error OR failed

What this does: Shows ArgoCD sync failures that may have caused a bad rollout.

Chapter 5: Advanced Queries

Combining conditions (AND, OR, NOT)

Operator	Syntax	Example
AND	Space-separated words	`error database` -- lines with both words
OR	`OR` between words	`error OR panic` -- lines with either word
NOT	`NOT` before a word	`error NOT "404"` -- errors excluding 404s

AND example:

_stream:{kubernetes.container_name="orchestrator"} error database

What this does: Matches lines containing both "error" AND "database".

OR example:

_stream:{kubernetes.container_name="orchestrator"} error OR panic

What this does: Matches lines containing either "error" or "panic".

NOT example:

_stream:{kubernetes.container_name="orchestrator"} error NOT "404"

What this does: Shows errors but excludes 404-related lines.

осторожно

AND, OR, and NOT must be uppercase. Lowercase and, or, not will be treated as literal words to search for.

Regex matching

Use container_name for stable filtering instead of regex on pod names. Container names do not change across restarts or CronJob runs:

_stream:{kubernetes.namespace_name="userswarms", kubernetes.container_name="agent-runtime"}

What this does: Matches all Agent Runtime workspace pods regardless of pod name suffix.

Pattern matching on field values -- use field:~"pattern" outside the stream selector:

_stream:{kubernetes.namespace_name="userswarms"} kubernetes.pod_name:~"agent-runtime-workspace-81a5.*"

What this does: First selects all logs from the userswarms namespace, then filters to pods matching the pattern.

Pattern matching in the log message -- use re() in a filter pipe:

_stream:{kubernetes.container_name="orchestrator"} | filter _msg:~"user_id=[0-9]+"

What this does: Finds log lines containing a numeric user_id field.

Counting and statistics

Count errors per container:

_stream:{kubernetes.namespace_name="backend"} error | stats by (kubernetes.container_name) count() as errors

What this does: Groups error logs by container name and counts them.

Count errors over time (spot spikes):

_stream:{kubernetes.container_name="orchestrator"} level:ERROR | stats count() as error_count

What this does: Shows the total error count, useful for detecting spikes in a time range.

Sorting results

Most recent first:

_stream:{kubernetes.container_name="orchestrator"} error | sort by (_time) desc

What this does: Shows the newest errors at the top.

Oldest first (follow a startup sequence):

_stream:{kubernetes.container_name="orchestrator"} | sort by (_time) asc | limit 100

What this does: Shows the first 100 log lines in chronological order.

Time-based filtering

Relative time (last N minutes/hours/days):

_stream:{kubernetes.namespace_name="backend"} error _time:5m

What this does: Shows errors from the last 5 minutes only.

Shorthand	Meaning
`_time:5m`	Last 5 minutes
`_time:1h`	Last 1 hour
`_time:24h`	Last 24 hours
`_time:7d`	Last 7 days

Exact time range:

_stream:{kubernetes.namespace_name="backend"} error _time:[2026-04-04T14:00:00Z, 2026-04-04T14:30:00Z]

What this does: Shows errors within a precise 30-minute window.

подсказка

Use relative time (_time:5m) for quick checks. Use exact ranges when investigating a known incident window.

JSON field filtering (for structured logs)

The orchestrator and webhook emit JSON logs via Go's slog. Fluent Bit's parser filter automatically extracts every top-level JSON key into a separate field before the record reaches VictoriaLogs. This means you do not need to match raw JSON substrings -- fields are already indexed and directly searchable.

The orchestrator emits JSON like:

{"time":"2026-04-04T12:00:00Z","level":"INFO","msg":"request received","method":"GET","path":"/v1/health","request_id":"abc123"}

After Fluent Bit parses it, VictoriaLogs receives individual fields: level, _msg, method, path, request_id, etc.

Old (wrong) -- matching a raw JSON substring:

_stream:{kubernetes.container_name="orchestrator"} "level":"ERROR"

New (correct) -- querying the extracted field directly:

_stream:{kubernetes.container_name="orchestrator"} level:ERROR

Filter by method and level:

_stream:{kubernetes.container_name="orchestrator"} method:POST level:ERROR

What this does: Finds ERROR-level logs for POST requests.

Filter by path:

_stream:{kubernetes.container_name="orchestrator"} path:/v1/auth level:ERROR

What this does: Finds ERROR-level logs for the /v1/auth endpoint.

_msg vs message

The _msg field contains the human-readable msg value from slog (e.g. request started), not the raw JSON string. Use _msg when you want to search or display the log message text.

JSON field extraction is automatic

Any service that writes JSON to stdout gets this treatment for free. Fluent Bit's parser filter detects JSON output and promotes every top-level key to its own searchable field -- no per-service configuration required.

Selecting specific fields

_stream:{kubernetes.container_name="orchestrator"} error | fields _time, message

What this does: Strips away Kubernetes metadata and shows only timestamp and message.

Limiting results

_stream:{kubernetes.namespace_name="backend"} error | limit 20

What this does: Returns only the first 20 matches.

подсказка

Start with a small limit when exploring. You can always increase it once you know the query returns what you want.

Combining pipes

Pipes chain left to right with |:

_stream:{kubernetes.namespace_name="backend"} error
  | fields _time, kubernetes.container_name, message
  | sort by (_time) desc
  | limit 50

What this does: Gets error logs from backend, keeps only three fields, sorts newest first, and returns the top 50.

Chapter 6: Quick Reference Card

I want to...	Query
See everything	`*`
See all orchestrator logs	`_stream:{kubernetes.container_name="orchestrator"}`
See orchestrator errors	`_stream:{kubernetes.container_name="orchestrator"} level:ERROR`
See all backend namespace logs	`_stream:{kubernetes.namespace_name="backend"}`
See errors across all namespaces	`error OR panic OR fatal`
See webhook logs	`_stream:{kubernetes.container_name="webhook"}`
See all agent runtime logs	`_stream:{kubernetes.namespace_name="userswarms"}`
See a specific agent pod	`_stream:{kubernetes.namespace_name="userswarms", kubernetes.pod_name="agent-runtime-workspace-81a5f386-c6a6-4c0a-b6a-3353eb37c1-0"}`
See PostgreSQL errors	`_stream:{kubernetes.pod_name="backend-postgresql-0"} ERROR OR FATAL`
See Redis logs	`_stream:{kubernetes.pod_name="backend-redis-master-0"}`
See ArgoCD sync errors	`_stream:{kubernetes.namespace_name="argocd"} "sync failed" OR error`
See cert-manager issues	`_stream:{kubernetes.namespace_name="cert-manager"} error`
See Envoy Gateway logs	`_stream:{kubernetes.namespace_name="envoy-gateway-system"}`
See external-dns logs	`_stream:{kubernetes.namespace_name="external-dns"}`
See Fluent Bit logs	`_stream:{kubernetes.container_name="fluent-bit"}`
See only stderr output	`_stream:{kubernetes.container_name="orchestrator", stream="stderr"}`
Count errors per container	`_stream:{kubernetes.namespace_name="backend"} error \| stats by (kubernetes.container_name) count() as errors`
See last 5 minutes only	`_stream:{kubernetes.container_name="orchestrator"} _time:5m`
Find a specific error message	`_stream:{kubernetes.namespace_name="backend"} "connection refused"`
See the docs site logs	`_stream:{kubernetes.namespace_name="backend", kubernetes.container_name="docs"}`

Retention

warning

Logs are retained for 14 days. Records older than 14 days are automatically deleted. If you need to investigate something older, check whether any exports or captures exist before the window closes.

🔗 Terms On This Page

If a term below is unfamiliar, open its glossary entry. For the full list, go to Internal Glossary.

DOKS: DigitalOcean Kubernetes, the managed Kubernetes service used for the Crawbl cluster.
ArgoCD: The GitOps deployment system that keeps the cluster aligned with what is committed in Git.

Chapter 1: Getting Started​

What is VictoriaLogs and why we use it​

How logs flow from your app to VictoriaLogs​

How to open the UI (prod only)​

Your first query​

Chapter 2: Understanding the Log Structure​

Every log line has metadata​

The fields you will use​

kubernetes.namespace_name -- which namespace the pod lives in​

kubernetes.pod_name -- the full pod name including the random suffix​

kubernetes.container_name -- the container name within the pod​

stream -- stdout vs stderr​

How to use _stream filters​

Chapter 3: Logs for Every Crawbl Service​

Orchestrator​

UserSwarm Webhook​

E2E Reaper​

Agent Runtime​

PostgreSQL​

Redis​

ArgoCD​

Envoy Gateway​

Cert-Manager​

External DNS​

External Secrets​

Fluent Bit​

VictoriaMetrics​

VictoriaLogs​

Docs Site​

Website​

Chapter 4: Common Troubleshooting Scenarios​

"The API is returning 500 errors"​

"A user's AI agent isn't starting"​

"ArgoCD sync is failing"​

"TLS certificate isn't renewing"​

"DNS records aren't updating"​

"The database is slow"​

"Redis is not responding"​

"A new deployment broke something -- what changed?"​

Chapter 5: Advanced Queries​

Combining conditions (AND, OR, NOT)​

Regex matching​

Counting and statistics​

Sorting results​

Time-based filtering​

JSON field filtering (for structured logs)​

Selecting specific fields​

Limiting results​

Combining pipes​

Chapter 6: Quick Reference Card​

Retention​

🔗 Terms On This Page

Chapter 1: Getting Started

What is VictoriaLogs and why we use it

How logs flow from your app to VictoriaLogs

How to open the UI (prod only)

Your first query

Chapter 2: Understanding the Log Structure

Every log line has metadata

The fields you will use

`kubernetes.namespace_name` -- which namespace the pod lives in

`kubernetes.pod_name` -- the full pod name including the random suffix

`kubernetes.container_name` -- the container name within the pod

`stream` -- stdout vs stderr

How to use `_stream` filters

Chapter 3: Logs for Every Crawbl Service

Orchestrator

UserSwarm Webhook

E2E Reaper

Agent Runtime

PostgreSQL

Redis

ArgoCD

Envoy Gateway

Cert-Manager

External DNS

External Secrets

Fluent Bit

VictoriaMetrics

VictoriaLogs

Docs Site

Website

Chapter 4: Common Troubleshooting Scenarios

"The API is returning 500 errors"

"A user's AI agent isn't starting"

"ArgoCD sync is failing"

"TLS certificate isn't renewing"

"DNS records aren't updating"

"The database is slow"

"Redis is not responding"

"A new deployment broke something -- what changed?"

Chapter 5: Advanced Queries

Combining conditions (AND, OR, NOT)

Regex matching

Counting and statistics

Sorting results

Time-based filtering

JSON field filtering (for structured logs)

Selecting specific fields

Limiting results

Combining pipes

Chapter 6: Quick Reference Card

Retention