empower your systems with live, global web intelligence and per-request grounding — on any model.

Dynamic Web Intelligence: Real-Time Grounding

osmAPI enables any model to break past its training data cutoff by integrating web search capabilities. Simply include { "type": "web_search" } in your tools — osmAPI automatically selects the best search engine for your chosen model.

How It Works: Dual-Engine Architecture

osmAPI uses a dual-engine system to ensure web search works across all models:

Engine	Used For	How It Works	Pricing
Native	OpenAI, Anthropic, Google, xAI	Search tool passed directly to the provider's built-in search	Provider-specific rates
Context Injection	All other models (Qwen, Mistral, etc.)	osmAPI searches the web via Serper (Google Search), then injects results into the prompt as context	$0.001 per search

You don't need to know which engine is used — the same API call works for every model. osmAPI detects whether the provider supports native web search and automatically falls back to context injection when needed.

Native Search (Built-in Providers)

For models with built-in web search support (OpenAI, Anthropic, Google, xAI), the web_search tool is passed directly to the provider. The model autonomously decides when to search, crafts optimized queries, and can perform multiple searches per request.

Context Injection (All Other Models)

For models without native web search, osmAPI:

Extracts the search query from the user's last message
Executes a Google Search via the Serper API
Injects the top results as a system message before the conversation
Sends the enriched prompt to the model as a normal chat request

The model receives the search results as context and uses them to generate a grounded response — no tool-calling capability required.

Implementation Mechanics

Activating the Search Engine

The simplest way to enable web search is with the web_search boolean shorthand:

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{ "role": "user", "content": "What happened in the news today?" }],
    "web_search": true
  }'

Alternatively, include the web_search tool in your tools array for more configuration options. The same request format works for all models.

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.2",
    "messages": [{ "role": "user", "content": "What are the core updates from the AI summit today?" }],
    "tools": [{ "type": "web_search" }]
  }'

This works identically with any model:

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "alibaba/qwen3-coder",
    "messages": [{ "role": "user", "content": "What are the latest TypeScript 6.0 features?" }],
    "tools": [{ "type": "web_search" }]
  }'

Sophisticated Search Configuration

osmAPI provides granular control over the search event through optional parameters. These options are available for native search providers (OpenAI, Anthropic, Google, xAI).

Geo-Spatial Grounding (Location)

Calibrate search results based on the requester's physical context. This is essential for local services, weather, or regional news.

{
	"type": "web_search",
	"user_location": {
		"city": "London",
		"region": "Greater London",
		"country": "UK",
		"timezone": "Europe/London"
	}
}

Discovery Depth (Context Size)

Manage the volume of data retrieved from the web (optimized for the GPT-5 series).

low: Prioritizes speed and baseline facts.
medium: The standard balance between depth and latency.
high: Maximum information retrieval for comprehensive research.

Search Velocity (Max Uses)

Constrain the model to a specific number of unique searches per interaction to manage costs and response time.

{
	"type": "web_search",
	"max_uses": 2
}

Advanced search configuration (location, context size, max uses) only applies to native search providers. For context injection models, osmAPI performs a single search per request using the user's message as the query.

Telemetry & Fiscal Visibility

Web search costs are tracked separately in the usage object, regardless of which engine was used:

{
	"usage": {
		"prompt_tokens": 125,
		"completion_tokens": 250,
		"cost_usd_total": 0.045,
		"cost_usd_web_search": 0.015
	}
}

The cost_usd_web_search field captures the direct investment in web queries.

Pricing Breakdown

Engine	Cost
OpenAI native search	Provider-specific (varies by `search_context_size`)
Anthropic native search	Provider-specific (per search query)
Google grounding	Provider-specific (per grounded request)
Context injection (all other models)	$0.001 per search

Web search costs are included in cost_usd_total and are automatically deducted from your credits or coupon balance — no separate billing required.

Informational Integrity (Citations)

For native search providers, every response is accompanied by source citations in the annotations field:

{
	"annotations": [
		{
			"type": "url_citation",
			"url": "https://reuters.com/modern-ai-research",
			"title": "State of AI 2026",
			"start_index": 120,
			"end_index": 155
		}
	]
}

For context injection models, the search results are included in the model's context. The model may reference sources in its response text, but structured annotations are not available.

Strategic Integration Best Practices

Multi-Tool Orchestration

You can combine web search with private function tools to create powerful hybrid agents.

{
	"tools": [
		{ "type": "web_search" },
		{
			"type": "function",
			"function": {
				"name": "update_internal_registry",
				"description": "Log verified web data into private database"
			}
		}
	]
}

Low-Latency Streaming

For real-time user experiences, enable stream: true. The search results and citations are integrated into the concluding chunks of the stream.

Model Selection Guide

Use Case	Recommended
Best search quality with citations	Native providers (OpenAI, Anthropic, Google)
Cost-effective web grounding	Context injection models ($0.001/search)
Multi-search per request	Native providers (support multiple queries)
Models without tool calling	Context injection (works with any model)

Practical Deployment Scenarios

Live Market Forensics: Monitor stock movements or crypto trends with sub-minute accuracy.
News Aggregation: Build assistants that provide summarized briefings on evolving humanitarian or political events.
Scientific Validation: Fact-check claims against the most recent peer-reviewed publications.
Localized Concierges: Provide restaurant, travel, and event recommendations grounded in current availability.

Native vs Context Injection Comparison

Feature	Native Search	Context Injection
Supported models	OpenAI, Anthropic, Google, xAI	All other models
Search quality	High (model crafts optimized queries)	Good (uses user's message as query)
Multiple searches	Yes (model decides)	Single search per request
Citations	Structured `annotations`	In-text references
Cost	Provider-specific	$0.001 per search
Latency	Provider handles search	~200-500ms additional
Tool calling required	Yes	No

Billing & Credits

Web search costs are part of the total request cost and follow the same billing flow as token costs:

Credits mode: Web search cost is deducted from your credit balance
Coupon/promo credits: Web search cost is covered by promo pools (deducted first, before general credits)
API keys mode: Web search cost is tracked in API key usage
Hybrid mode: Falls back to credits if no provider key is available

The cost_usd_web_search field in the API response always reflects the exact web search cost for the request.

Real-Time Grounding & Web Search