Chat Completions
Text generation with 130+ models across OpenAI, Anthropic, Google, and more
Chat Completions
The core API. Send messages, get AI responses. Compatible with the OpenAI SDK — just change the base URL.
Quick Start
curl -X POST "https://api.osmapi.com/v1/chat/completions" \
-H "Authorization: Bearer $OSM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What is osmAPI?"}
]
}'SDK Examples
from openai import OpenAI
client = OpenAI(
api_key="your-osm-api-key",
base_url="https://api.osmapi.com/v1"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "your-osm-api-key",
baseURL: "https://api.osmapi.com/v1",
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);Streaming
Enable real-time token streaming:
curl -X POST "https://api.osmapi.com/v1/chat/completions" \
-H "Authorization: Bearer $OSM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Write a haiku"}],
"stream": true
}'Supported Models
130+ models across 25+ providers. Use any model by name or with a provider prefix:
# Auto-routed (osmAPI picks the best provider)
"model": "gpt-4o"
# Provider-specific
"model": "openai/gpt-4o"
"model": "anthropic/claude-sonnet-4-6"
"model": "google-ai-studio/gemini-2.5-flash"
"model": "groq/llama-3.3-70b-instruct"Browse all models at app.osmapi.com/models.
Parameters
| Parameter | Type | Description |
|---|---|---|
model | string | Model ID or provider/model format |
messages | array | Conversation messages (role + content) |
stream | boolean | Enable SSE streaming |
temperature | number | 0-2, controls randomness |
max_tokens | number | Maximum tokens to generate |
top_p | number | Nucleus sampling |
tools | array | Function calling tools |
tool_choice | string/object | Control tool usage |
response_format | object | Force JSON output |
frequency_penalty | number | Penalizes repeated tokens. Range: -2.0 to 2.0 |
presence_penalty | number | Penalizes tokens based on presence. Range: -2.0 to 2.0 |
reasoning_effort | string | For reasoning models (minimal/low/medium/high) |
web_search | boolean | Enable web search for the request |
plugins | array | Enable plugins like ["response-healing"] |
Features
Vision
Send images for analysis
Tool Calling
Function calling and web search
Reasoning
Advanced reasoning models
Streaming
SSE streaming with response caching
Cost Tracking
Every response includes cost in USD:
{
"usage": {
"prompt_tokens": 15,
"completion_tokens": 42,
"total_tokens": 57,
"cost_usd_total": 0.000285,
"cost_usd_input": 0.0000225,
"cost_usd_output": 0.0002625,
"cost_usd_cached_input": 0.0,
"cost_usd_request": 0.000285
}
}See the Cost Breakdown guide for details.
How is this guide?