SearXNG
SearXNG is the meta-search engine that powers the web_search_tool in the Crawbl agent runtime. It aggregates results from multiple search engines into a single, deduplicated JSON response — so agents get broad internet coverage without needing API keys for Google, Bing, or any individual provider.
Why SearXNG?
| Concern | How SearXNG solves it |
|---|---|
| No per-provider API keys | SearXNG scrapes public search UIs — no Google Custom Search billing, no Bing API subscriptions |
| Multiple engines in one call | A single query fans out to Google, Bing, DuckDuckGo, Brave, Qwant, Wikipedia, and Wikidata simultaneously |
| Deduplication | Results from multiple engines are merged and deduplicated before returning |
| Privacy | Runs in-cluster — user queries never leave the Crawbl infrastructure to reach a third-party search API |
| Simple JSON API | One HTTP endpoint, one query parameter, one JSON response — trivial to integrate |
SearXNG is an open-source project maintained at github.com/searxng/searxng. Crawbl runs a self-hosted instance inside the cluster, configured as an ArgoCD-managed component in crawbl-argocd-apps/components/searxng/.
Connection Details
| Property | Value |
|---|---|
| Service name | searxng |
| Namespace | backend |
| Port | 8080 (HTTP) |
| In-cluster endpoint | http://searxng.backend.svc.cluster.local:8080 |
| External URL | None (internal only) |
| ArgoCD app | crawbl-argocd-apps/root/searxng.yaml |
| Helm chart | crawbl-argocd-apps/components/searxng/chart/ |
| Values | crawbl-argocd-apps/components/searxng/envs/dev.yaml |
Search Engines
SearXNG is configured to aggregate results from these search engines:
| Engine | Type | What it provides |
|---|---|---|
| General web search | Broad web coverage | |
| Bing | General web search | Microsoft's index, good for recent content |
| DuckDuckGo | General web search | Privacy-focused results |
| Brave | General web search | Independent index |
| Qwant | General web search | European search engine |
| Wikipedia | Encyclopedia | Factual, structured content |
| Wikidata | Knowledge base | Structured entity data |
The engine mix can be changed in the SearXNG settings without touching any agent runtime code. The runtime only sees the merged result set — it does not know or care which engines are enabled.
API Usage
Search endpoint
GET /search?q={query}&format=json&safesearch=0&language=en
Query parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
q | Yes | — | Search query (free text) |
format | Yes | — | Must be json for the API response |
safesearch | No | 0 | 0 = off, 1 = moderate, 2 = strict |
language | No | en | Language code for results |
Request headers
Accept: application/json
User-Agent: crawbl-agent-runtime (+https://crawbl.com)
The Accept: application/json header is required. Without it, some SearXNG configurations fall back to HTML even with ?format=json.
Response format
{
"results": [
{
"title": "Example Page Title",
"url": "https://example.com/page",
"content": "A brief excerpt from the page matching the query...",
"engine": "google"
},
{
"title": "Another Result",
"url": "https://another.com",
"content": "More relevant content...",
"engine": "bing"
}
]
}
The full SearXNG response includes additional fields (suggestions, infoboxes, unresponsive_engines, answers), but the agent runtime only consumes the results array.
Example: searching from the command line
# Port-forward SearXNG
kubectl port-forward svc/searxng 8080:8080 -n backend &
# Run a search
curl -s 'http://localhost:8080/search?q=kubernetes+1.31&format=json&safesearch=0&language=en' \
-H 'Accept: application/json' | jq '.results[:3]'
How the Agent Runtime Connects
The agent runtime configures the SearXNG endpoint at startup:
| Setting | Value |
|---|---|
| Default | http://searxng.backend.svc.cluster.local:8080 (set in internal/agentruntime/config/defaults.go) |
| Override | CRAWBL_SEARXNG_ENDPOINT environment variable or --searxng-endpoint CLI flag |
The web_search_tool calls GET {endpoint}/search?q=...&format=json with a 10-second timeout and a 4 MiB response body limit.
What happens on a search
Agent LLM emits tool call: web_search_tool(query="...", max_results=5)
│
▼
Agent Runtime (tools/local/web_search.go)
├── Validate query (non-empty)
├── Cap max_results (default 5, ceiling 15)
├── Build URL: {endpoint}/search?q=...&format=json&safesearch=0&language=en
├── HTTP GET with 10s timeout
├── Parse JSON response
├── Extract top N results (title, url, snippet, engine)
└── Return to LLM as tool result
Debugging
Check if SearXNG is running
kubectl get pods -n backend -l app.kubernetes.io/name=searxng
View SearXNG logs
kubectl logs -n backend -l app.kubernetes.io/name=searxng --tail=50
Test connectivity from inside the cluster
kubectl run -it --rm debug --image=curlimages/curl -- \
curl -s 'http://searxng.backend.svc.cluster.local:8080/search?q=test&format=json' \
-H 'Accept: application/json' | head -c 500
Common issues
| Symptom | Likely cause | Fix |
|---|---|---|
web_search_tool: searxng endpoint is not configured | CRAWBL_SEARXNG_ENDPOINT is empty | Set the env var in the runtime config |
web_search_tool: searxng returned status 429 | Rate limiting | SearXNG is being queried too frequently; check upstream engine rate limits |
web_search_tool: GET ... context deadline exceeded | 10-second timeout hit | SearXNG may be overloaded or an upstream engine is slow |
| Empty results | Upstream engines returned nothing | Try a different query; check unresponsive_engines in raw SearXNG response |
Resource Usage
SearXNG is lightweight for a dev cluster:
| Resource | Value |
|---|---|
| CPU request | 50m |
| Memory request | 128 Mi |
| Storage | None (stateless) |
| Replicas | 1 |
SearXNG stores no data — it is a stateless proxy that fans queries out to upstream engines and merges the results.
What's next: See the Agent Runtime Tools guide for how web_search_tool fits into the full tool system, or the Dev Services & Access page for an overview of all cluster services.