SearXNG

SearXNG is the meta-search engine that powers the web_search_tool in the Crawbl agent runtime. It aggregates results from multiple search engines into a single, deduplicated JSON response — so agents get broad internet coverage without needing API keys for Google, Bing, or any individual provider.

Why SearXNG?

Concern	How SearXNG solves it
No per-provider API keys	SearXNG scrapes public search UIs — no Google Custom Search billing, no Bing API subscriptions
Multiple engines in one call	A single query fans out to Google, Bing, DuckDuckGo, Brave, Qwant, Wikipedia, and Wikidata simultaneously
Deduplication	Results from multiple engines are merged and deduplicated before returning
Privacy	Runs in-cluster — user queries never leave the Crawbl infrastructure to reach a third-party search API
Simple JSON API	One HTTP endpoint, one query parameter, one JSON response — trivial to integrate

к сведению

SearXNG is an open-source project maintained at github.com/searxng/searxng. Crawbl runs a self-hosted instance inside the cluster, configured as an ArgoCD-managed component in crawbl-argocd-apps/components/searxng/.

Connection Details

Property	Value
Service name	`searxng`
Namespace	`backend`
Port	8080 (HTTP)
In-cluster endpoint	`http://searxng.backend.svc.cluster.local:8080`
External URL	None (internal only)
ArgoCD app	`crawbl-argocd-apps/root/searxng.yaml`
Helm chart	`crawbl-argocd-apps/components/searxng/chart/`
Values	`crawbl-argocd-apps/components/searxng/envs/dev.yaml`

Search Engines

SearXNG is configured to aggregate results from these search engines:

Engine	Type	What it provides
Google	General web search	Broad web coverage
Bing	General web search	Microsoft's index, good for recent content
DuckDuckGo	General web search	Privacy-focused results
Brave	General web search	Independent index
Qwant	General web search	European search engine
Wikipedia	Encyclopedia	Factual, structured content
Wikidata	Knowledge base	Structured entity data

подсказка

The engine mix can be changed in the SearXNG settings without touching any agent runtime code. The runtime only sees the merged result set — it does not know or care which engines are enabled.

API Usage

Search endpoint

GET /search?q={query}&format=json&safesearch=0&language=en

Query parameters

Parameter	Required	Default	Description
`q`	Yes	—	Search query (free text)
`format`	Yes	—	Must be `json` for the API response
`safesearch`	No	`0`	`0` = off, `1` = moderate, `2` = strict
`language`	No	`en`	Language code for results

Request headers

Accept: application/json
User-Agent: crawbl-agent-runtime (+https://crawbl.com)

warning

The Accept: application/json header is required. Without it, some SearXNG configurations fall back to HTML even with ?format=json.

Response format

{
  "results": [
    {
      "title": "Example Page Title",
      "url": "https://example.com/page",
      "content": "A brief excerpt from the page matching the query...",
      "engine": "google"
    },
    {
      "title": "Another Result",
      "url": "https://another.com",
      "content": "More relevant content...",
      "engine": "bing"
    }
  ]
}

The full SearXNG response includes additional fields (suggestions, infoboxes, unresponsive_engines, answers), but the agent runtime only consumes the results array.

Example: searching from the command line

# Port-forward SearXNG
kubectl port-forward svc/searxng 8080:8080 -n backend &

# Run a search
curl -s 'http://localhost:8080/search?q=kubernetes+1.31&format=json&safesearch=0&language=en' \
  -H 'Accept: application/json' | jq '.results[:3]'

How the Agent Runtime Connects

The agent runtime configures the SearXNG endpoint at startup:

Setting	Value
Default	`http://searxng.backend.svc.cluster.local:8080` (set in `internal/agentruntime/config/defaults.go`)
Override	`CRAWBL_SEARXNG_ENDPOINT` environment variable or `--searxng-endpoint` CLI flag

The web_search_tool calls GET {endpoint}/search?q=...&format=json with a 10-second timeout and a 4 MiB response body limit.

What happens on a search

Agent LLM emits tool call: web_search_tool(query="...", max_results=5)
    │
    ▼
Agent Runtime (tools/local/web_search.go)
    ├── Validate query (non-empty)
    ├── Cap max_results (default 5, ceiling 15)
    ├── Build URL: {endpoint}/search?q=...&format=json&safesearch=0&language=en
    ├── HTTP GET with 10s timeout
    ├── Parse JSON response
    ├── Extract top N results (title, url, snippet, engine)
    └── Return to LLM as tool result

Debugging

Check if SearXNG is running

kubectl get pods -n backend -l app.kubernetes.io/name=searxng

View SearXNG logs

kubectl logs -n backend -l app.kubernetes.io/name=searxng --tail=50

Test connectivity from inside the cluster

kubectl run -it --rm debug --image=curlimages/curl -- \
  curl -s 'http://searxng.backend.svc.cluster.local:8080/search?q=test&format=json' \
  -H 'Accept: application/json' | head -c 500

Common issues

Symptom	Likely cause	Fix
`web_search_tool: searxng endpoint is not configured`	`CRAWBL_SEARXNG_ENDPOINT` is empty	Set the env var in the runtime config
`web_search_tool: searxng returned status 429`	Rate limiting	SearXNG is being queried too frequently; check upstream engine rate limits
`web_search_tool: GET ... context deadline exceeded`	10-second timeout hit	SearXNG may be overloaded or an upstream engine is slow
Empty results	Upstream engines returned nothing	Try a different query; check `unresponsive_engines` in raw SearXNG response

Resource Usage

SearXNG is lightweight for a dev cluster:

Resource	Value
CPU request	50m
Memory request	128 Mi
Storage	None (stateless)
Replicas	1

SearXNG stores no data — it is a stateless proxy that fans queries out to upstream engines and merges the results.

What's next: See the Agent Runtime Tools guide for how web_search_tool fits into the full tool system, or the Dev Services & Access page for an overview of all cluster services.

Why SearXNG?​

Connection Details​

Search Engines​

API Usage​

Search endpoint​

Query parameters​

Request headers​

Response format​

Example: searching from the command line​

How the Agent Runtime Connects​

What happens on a search​

Debugging​

Check if SearXNG is running​

View SearXNG logs​

Test connectivity from inside the cluster​

Common issues​

Resource Usage​