New: Audio API, Embeddings & Realtime WebSocket now available!
osmAPI LogoosmAPI

Chat Completions

Text generation with 130+ models across OpenAI, Anthropic, Google, and more

Chat Completions

The core API. Send messages, get AI responses. Compatible with the OpenAI SDK — just change the base URL.

Quick Start

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What is osmAPI?"}
    ]
  }'

SDK Examples

from openai import OpenAI

client = OpenAI(
    api_key="your-osm-api-key",
    base_url="https://api.osmapi.com/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "your-osm-api-key",
  baseURL: "https://api.osmapi.com/v1",
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

Streaming

Enable real-time token streaming:

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Write a haiku"}],
    "stream": true
  }'

Supported Models

130+ models across 25+ providers. Use any model by name or with a provider prefix:

# Auto-routed (osmAPI picks the best provider)
"model": "gpt-4o"

# Provider-specific
"model": "openai/gpt-4o"
"model": "anthropic/claude-sonnet-4-6"
"model": "google-ai-studio/gemini-2.5-flash"
"model": "groq/llama-3.3-70b-instruct"

Browse all models at app.osmapi.com/models.

Parameters

ParameterTypeDescription
modelstringModel ID or provider/model format
messagesarrayConversation messages (role + content)
streambooleanEnable SSE streaming
temperaturenumber0-2, controls randomness
max_tokensnumberMaximum tokens to generate
top_pnumberNucleus sampling
toolsarrayFunction calling tools
tool_choicestring/objectControl tool usage
response_formatobjectForce JSON output
frequency_penaltynumberPenalizes repeated tokens. Range: -2.0 to 2.0
presence_penaltynumberPenalizes tokens based on presence. Range: -2.0 to 2.0
reasoning_effortstringFor reasoning models (minimal/low/medium/high)
web_searchbooleanEnable web search for the request
pluginsarrayEnable plugins like ["response-healing"]

Features

Cost Tracking

Every response includes cost in USD:

{
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 42,
    "total_tokens": 57,
    "cost_usd_total": 0.000285,
    "cost_usd_input": 0.0000225,
    "cost_usd_output": 0.0002625,
    "cost_usd_cached_input": 0.0,
    "cost_usd_request": 0.000285
  }
}

See the Cost Breakdown guide for details.

How is this guide?