New: Audio API, Embeddings & Realtime WebSocket now available!
osmAPI LogoosmAPI

Real-Time Grounding & Web Search

empower your systems with live, global web intelligence and per-request grounding — on any model.

Dynamic Web Intelligence: Real-Time Grounding

osmAPI enables any model to break past its training data cutoff by integrating web search capabilities. Simply include { "type": "web_search" } in your tools — osmAPI automatically selects the best search engine for your chosen model.

How It Works: Dual-Engine Architecture

osmAPI uses a dual-engine system to ensure web search works across all models:

EngineUsed ForHow It WorksPricing
NativeOpenAI, Anthropic, Google, xAISearch tool passed directly to the provider's built-in searchProvider-specific rates
Context InjectionAll other models (Qwen, Mistral, Neysa, etc.)osmAPI searches the web via Serper (Google Search), then injects results into the prompt as context$0.001 per search

You don't need to know which engine is used — the same API call works for every model. osmAPI detects whether the provider supports native web search and automatically falls back to context injection when needed.

Native Search (Built-in Providers)

For models with built-in web search support (OpenAI, Anthropic, Google, xAI), the web_search tool is passed directly to the provider. The model autonomously decides when to search, crafts optimized queries, and can perform multiple searches per request.

Context Injection (All Other Models)

For models without native web search, osmAPI:

  1. Extracts the search query from the user's last message
  2. Executes a Google Search via the Serper API
  3. Injects the top results as a system message before the conversation
  4. Sends the enriched prompt to the model as a normal chat request

The model receives the search results as context and uses them to generate a grounded response — no tool-calling capability required.


Implementation Mechanics

Activating the Search Engine

The simplest way to enable web search is with the web_search boolean shorthand:

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{ "role": "user", "content": "What happened in the news today?" }],
    "web_search": true
  }'

Alternatively, include the web_search tool in your tools array for more configuration options. The same request format works for all models.

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.2",
    "messages": [{ "role": "user", "content": "What are the core updates from the AI summit today?" }],
    "tools": [{ "type": "web_search" }]
  }'

This works identically with any model:

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "alibaba/qwen3-coder",
    "messages": [{ "role": "user", "content": "What are the latest TypeScript 6.0 features?" }],
    "tools": [{ "type": "web_search" }]
  }'

Sophisticated Search Configuration

osmAPI provides granular control over the search event through optional parameters. These options are available for native search providers (OpenAI, Anthropic, Google, xAI).

Geo-Spatial Grounding (Location)

Calibrate search results based on the requester's physical context. This is essential for local services, weather, or regional news.

{
	"type": "web_search",
	"user_location": {
		"city": "London",
		"region": "Greater London",
		"country": "UK",
		"timezone": "Europe/London"
	}
}

Discovery Depth (Context Size)

Manage the volume of data retrieved from the web (optimized for the GPT-5 series).

  • low: Prioritizes speed and baseline facts.
  • medium: The standard balance between depth and latency.
  • high: Maximum information retrieval for comprehensive research.

Search Velocity (Max Uses)

Constrain the model to a specific number of unique searches per interaction to manage costs and response time.

{
	"type": "web_search",
	"max_uses": 2
}

Advanced search configuration (location, context size, max uses) only applies to native search providers. For context injection models, osmAPI performs a single search per request using the user's message as the query.


Telemetry & Fiscal Visibility

Web search costs are tracked separately in the usage object, regardless of which engine was used:

{
	"usage": {
		"prompt_tokens": 125,
		"completion_tokens": 250,
		"cost_usd_total": 0.045,
		"cost_usd_web_search": 0.015
	}
}

The cost_usd_web_search field captures the direct investment in web queries.

Pricing Breakdown

EngineCost
OpenAI native searchProvider-specific (varies by search_context_size)
Anthropic native searchProvider-specific (per search query)
Google groundingProvider-specific (per grounded request)
Context injection (all other models)$0.001 per search

Web search costs are included in cost_usd_total and are automatically deducted from your credits or coupon balance — no separate billing required.


Informational Integrity (Citations)

For native search providers, every response is accompanied by source citations in the annotations field:

{
	"annotations": [
		{
			"type": "url_citation",
			"url": "https://reuters.com/modern-ai-research",
			"title": "State of AI 2026",
			"start_index": 120,
			"end_index": 155
		}
	]
}

For context injection models, the search results are included in the model's context. The model may reference sources in its response text, but structured annotations are not available.


Strategic Integration Best Practices

Multi-Tool Orchestration

You can combine web search with private function tools to create powerful hybrid agents.

{
	"tools": [
		{ "type": "web_search" },
		{
			"type": "function",
			"function": {
				"name": "update_internal_registry",
				"description": "Log verified web data into private database"
			}
		}
	]
}

Low-Latency Streaming

For real-time user experiences, enable stream: true. The search results and citations are integrated into the concluding chunks of the stream.

Model Selection Guide

Use CaseRecommended
Best search quality with citationsNative providers (OpenAI, Anthropic, Google)
Cost-effective web groundingContext injection models ($0.001/search)
Multi-search per requestNative providers (support multiple queries)
Models without tool callingContext injection (works with any model)

Practical Deployment Scenarios

  • Live Market Forensics: Monitor stock movements or crypto trends with sub-minute accuracy.
  • News Aggregation: Build assistants that provide summarized briefings on evolving humanitarian or political events.
  • Scientific Validation: Fact-check claims against the most recent peer-reviewed publications.
  • Localized Concierges: Provide restaurant, travel, and event recommendations grounded in current availability.

Native vs Context Injection Comparison

FeatureNative SearchContext Injection
Supported modelsOpenAI, Anthropic, Google, xAIAll other models
Search qualityHigh (model crafts optimized queries)Good (uses user's message as query)
Multiple searchesYes (model decides)Single search per request
CitationsStructured annotationsIn-text references
CostProvider-specific$0.001 per search
LatencyProvider handles search~200-500ms additional
Tool calling requiredYesNo

Billing & Credits

Web search costs are part of the total request cost and follow the same billing flow as token costs:

  • Credits mode: Web search cost is deducted from your credit balance
  • Coupon/promo credits: Web search cost is covered by promo pools (deducted first, before general credits)
  • API keys mode: Web search cost is tracked in API key usage
  • Hybrid mode: Falls back to credits if no provider key is available

The cost_usd_web_search field in the API response always reflects the exact web search cost for the request.

How is this guide?