Перейти к основному содержимому

SearXNG

SearXNG is the meta-search engine that powers the web_search_tool in the Crawbl agent runtime. It aggregates results from multiple search engines into a single, deduplicated JSON response — so agents get broad internet coverage without needing API keys for Google, Bing, or any individual provider.


Why SearXNG?

ConcernHow SearXNG solves it
No per-provider API keysSearXNG scrapes public search UIs — no Google Custom Search billing, no Bing API subscriptions
Multiple engines in one callA single query fans out to Google, Bing, DuckDuckGo, Brave, Qwant, Wikipedia, and Wikidata simultaneously
DeduplicationResults from multiple engines are merged and deduplicated before returning
PrivacyRuns in-cluster — user queries never leave the Crawbl infrastructure to reach a third-party search API
Simple JSON APIOne HTTP endpoint, one query parameter, one JSON response — trivial to integrate
к сведению

SearXNG is an open-source project maintained at github.com/searxng/searxng. Crawbl runs a self-hosted instance inside the cluster, configured as an ArgoCD-managed component in crawbl-argocd-apps/components/searxng/.


Connection Details

PropertyValue
Service namesearxng
Namespacebackend
Port8080 (HTTP)
In-cluster endpointhttp://searxng.backend.svc.cluster.local:8080
External URLNone (internal only)
ArgoCD appcrawbl-argocd-apps/root/searxng.yaml
Helm chartcrawbl-argocd-apps/components/searxng/chart/
Valuescrawbl-argocd-apps/components/searxng/envs/dev.yaml

Search Engines

SearXNG is configured to aggregate results from these search engines:

EngineTypeWhat it provides
GoogleGeneral web searchBroad web coverage
BingGeneral web searchMicrosoft's index, good for recent content
DuckDuckGoGeneral web searchPrivacy-focused results
BraveGeneral web searchIndependent index
QwantGeneral web searchEuropean search engine
WikipediaEncyclopediaFactual, structured content
WikidataKnowledge baseStructured entity data
подсказка

The engine mix can be changed in the SearXNG settings without touching any agent runtime code. The runtime only sees the merged result set — it does not know or care which engines are enabled.


API Usage

Search endpoint

GET /search?q={query}&format=json&safesearch=0&language=en

Query parameters

ParameterRequiredDefaultDescription
qYesSearch query (free text)
formatYesMust be json for the API response
safesearchNo00 = off, 1 = moderate, 2 = strict
languageNoenLanguage code for results

Request headers

Accept: application/json
User-Agent: crawbl-agent-runtime (+https://crawbl.com)
warning

The Accept: application/json header is required. Without it, some SearXNG configurations fall back to HTML even with ?format=json.

Response format

{
"results": [
{
"title": "Example Page Title",
"url": "https://example.com/page",
"content": "A brief excerpt from the page matching the query...",
"engine": "google"
},
{
"title": "Another Result",
"url": "https://another.com",
"content": "More relevant content...",
"engine": "bing"
}
]
}

The full SearXNG response includes additional fields (suggestions, infoboxes, unresponsive_engines, answers), but the agent runtime only consumes the results array.

Example: searching from the command line

# Port-forward SearXNG
kubectl port-forward svc/searxng 8080:8080 -n backend &

# Run a search
curl -s 'http://localhost:8080/search?q=kubernetes+1.31&format=json&safesearch=0&language=en' \
-H 'Accept: application/json' | jq '.results[:3]'

How the Agent Runtime Connects

The agent runtime configures the SearXNG endpoint at startup:

SettingValue
Defaulthttp://searxng.backend.svc.cluster.local:8080 (set in internal/agentruntime/config/defaults.go)
OverrideCRAWBL_SEARXNG_ENDPOINT environment variable or --searxng-endpoint CLI flag

The web_search_tool calls GET {endpoint}/search?q=...&format=json with a 10-second timeout and a 4 MiB response body limit.

Agent LLM emits tool call: web_search_tool(query="...", max_results=5)


Agent Runtime (tools/local/web_search.go)
├── Validate query (non-empty)
├── Cap max_results (default 5, ceiling 15)
├── Build URL: {endpoint}/search?q=...&format=json&safesearch=0&language=en
├── HTTP GET with 10s timeout
├── Parse JSON response
├── Extract top N results (title, url, snippet, engine)
└── Return to LLM as tool result

Debugging

Check if SearXNG is running

kubectl get pods -n backend -l app.kubernetes.io/name=searxng

View SearXNG logs

kubectl logs -n backend -l app.kubernetes.io/name=searxng --tail=50

Test connectivity from inside the cluster

kubectl run -it --rm debug --image=curlimages/curl -- \
curl -s 'http://searxng.backend.svc.cluster.local:8080/search?q=test&format=json' \
-H 'Accept: application/json' | head -c 500

Common issues

SymptomLikely causeFix
web_search_tool: searxng endpoint is not configuredCRAWBL_SEARXNG_ENDPOINT is emptySet the env var in the runtime config
web_search_tool: searxng returned status 429Rate limitingSearXNG is being queried too frequently; check upstream engine rate limits
web_search_tool: GET ... context deadline exceeded10-second timeout hitSearXNG may be overloaded or an upstream engine is slow
Empty resultsUpstream engines returned nothingTry a different query; check unresponsive_engines in raw SearXNG response

Resource Usage

SearXNG is lightweight for a dev cluster:

ResourceValue
CPU request50m
Memory request128 Mi
StorageNone (stateless)
Replicas1

SearXNG stores no data — it is a stateless proxy that fans queries out to upstream engines and merges the results.


What's next: See the Agent Runtime Tools guide for how web_search_tool fits into the full tool system, or the Dev Services & Access page for an overview of all cluster services.