Real-Time Grounding & Web Search
empower your systems with live, global web intelligence and per-request grounding — on any model.
Dynamic Web Intelligence: Real-Time Grounding
osmAPI enables any model to break past its training data cutoff by integrating web search capabilities. Simply include { "type": "web_search" } in your tools — osmAPI automatically selects the best search engine for your chosen model.
How It Works: Dual-Engine Architecture
osmAPI uses a dual-engine system to ensure web search works across all models:
| Engine | Used For | How It Works | Pricing |
|---|---|---|---|
| Native | OpenAI, Anthropic, Google, xAI | Search tool passed directly to the provider's built-in search | Provider-specific rates |
| Context Injection | All other models (Qwen, Mistral, Neysa, etc.) | osmAPI searches the web via Serper (Google Search), then injects results into the prompt as context | $0.001 per search |
You don't need to know which engine is used — the same API call works for every model. osmAPI detects whether the provider supports native web search and automatically falls back to context injection when needed.
Native Search (Built-in Providers)
For models with built-in web search support (OpenAI, Anthropic, Google, xAI), the web_search tool is passed directly to the provider. The model autonomously decides when to search, crafts optimized queries, and can perform multiple searches per request.
Context Injection (All Other Models)
For models without native web search, osmAPI:
- Extracts the search query from the user's last message
- Executes a Google Search via the Serper API
- Injects the top results as a system message before the conversation
- Sends the enriched prompt to the model as a normal chat request
The model receives the search results as context and uses them to generate a grounded response — no tool-calling capability required.
Implementation Mechanics
Activating the Search Engine
The simplest way to enable web search is with the web_search boolean shorthand:
curl -X POST "https://api.osmapi.com/v1/chat/completions" \
-H "Authorization: Bearer $OSM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{ "role": "user", "content": "What happened in the news today?" }],
"web_search": true
}'Alternatively, include the web_search tool in your tools array for more configuration options. The same request format works for all models.
curl -X POST "https://api.osmapi.com/v1/chat/completions" \
-H "Authorization: Bearer $OSM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.2",
"messages": [{ "role": "user", "content": "What are the core updates from the AI summit today?" }],
"tools": [{ "type": "web_search" }]
}'This works identically with any model:
curl -X POST "https://api.osmapi.com/v1/chat/completions" \
-H "Authorization: Bearer $OSM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "alibaba/qwen3-coder",
"messages": [{ "role": "user", "content": "What are the latest TypeScript 6.0 features?" }],
"tools": [{ "type": "web_search" }]
}'Sophisticated Search Configuration
osmAPI provides granular control over the search event through optional parameters. These options are available for native search providers (OpenAI, Anthropic, Google, xAI).
Geo-Spatial Grounding (Location)
Calibrate search results based on the requester's physical context. This is essential for local services, weather, or regional news.
{
"type": "web_search",
"user_location": {
"city": "London",
"region": "Greater London",
"country": "UK",
"timezone": "Europe/London"
}
}Discovery Depth (Context Size)
Manage the volume of data retrieved from the web (optimized for the GPT-5 series).
low: Prioritizes speed and baseline facts.medium: The standard balance between depth and latency.high: Maximum information retrieval for comprehensive research.
Search Velocity (Max Uses)
Constrain the model to a specific number of unique searches per interaction to manage costs and response time.
{
"type": "web_search",
"max_uses": 2
}Advanced search configuration (location, context size, max uses) only applies to native search providers. For context injection models, osmAPI performs a single search per request using the user's message as the query.
Telemetry & Fiscal Visibility
Web search costs are tracked separately in the usage object, regardless of which engine was used:
{
"usage": {
"prompt_tokens": 125,
"completion_tokens": 250,
"cost_usd_total": 0.045,
"cost_usd_web_search": 0.015
}
}The cost_usd_web_search field captures the direct investment in web queries.
Pricing Breakdown
| Engine | Cost |
|---|---|
| OpenAI native search | Provider-specific (varies by search_context_size) |
| Anthropic native search | Provider-specific (per search query) |
| Google grounding | Provider-specific (per grounded request) |
| Context injection (all other models) | $0.001 per search |
Web search costs are included in cost_usd_total and are automatically
deducted from your credits or coupon balance — no separate billing required.
Informational Integrity (Citations)
For native search providers, every response is accompanied by source citations in the annotations field:
{
"annotations": [
{
"type": "url_citation",
"url": "https://reuters.com/modern-ai-research",
"title": "State of AI 2026",
"start_index": 120,
"end_index": 155
}
]
}For context injection models, the search results are included in the model's context. The model may reference sources in its response text, but structured annotations are not available.
Strategic Integration Best Practices
Multi-Tool Orchestration
You can combine web search with private function tools to create powerful hybrid agents.
{
"tools": [
{ "type": "web_search" },
{
"type": "function",
"function": {
"name": "update_internal_registry",
"description": "Log verified web data into private database"
}
}
]
}Low-Latency Streaming
For real-time user experiences, enable stream: true. The search results and citations are integrated into the concluding chunks of the stream.
Model Selection Guide
| Use Case | Recommended |
|---|---|
| Best search quality with citations | Native providers (OpenAI, Anthropic, Google) |
| Cost-effective web grounding | Context injection models ($0.001/search) |
| Multi-search per request | Native providers (support multiple queries) |
| Models without tool calling | Context injection (works with any model) |
Practical Deployment Scenarios
- Live Market Forensics: Monitor stock movements or crypto trends with sub-minute accuracy.
- News Aggregation: Build assistants that provide summarized briefings on evolving humanitarian or political events.
- Scientific Validation: Fact-check claims against the most recent peer-reviewed publications.
- Localized Concierges: Provide restaurant, travel, and event recommendations grounded in current availability.
Native vs Context Injection Comparison
| Feature | Native Search | Context Injection |
|---|---|---|
| Supported models | OpenAI, Anthropic, Google, xAI | All other models |
| Search quality | High (model crafts optimized queries) | Good (uses user's message as query) |
| Multiple searches | Yes (model decides) | Single search per request |
| Citations | Structured annotations | In-text references |
| Cost | Provider-specific | $0.001 per search |
| Latency | Provider handles search | ~200-500ms additional |
| Tool calling required | Yes | No |
Billing & Credits
Web search costs are part of the total request cost and follow the same billing flow as token costs:
- Credits mode: Web search cost is deducted from your credit balance
- Coupon/promo credits: Web search cost is covered by promo pools (deducted first, before general credits)
- API keys mode: Web search cost is tracked in API key usage
- Hybrid mode: Falls back to credits if no provider key is available
The cost_usd_web_search field in the API response always reflects the exact web search cost for the request.
How is this guide?