Chat Completions

The core API. Send messages, get AI responses. Compatible with the OpenAI SDK — just change the base URL.

Quick Start

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What is osmAPI?"}
    ]
  }'

SDK Examples

from openai import OpenAI

client = OpenAI(
    api_key="your-osm-api-key",
    base_url="https://api.osmapi.com/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "your-osm-api-key",
  baseURL: "https://api.osmapi.com/v1",
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

Streaming

Enable real-time token streaming:

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Write a haiku"}],
    "stream": true
  }'

Supported Models

130+ models across 25+ providers. Use any model by name or with a provider prefix:

# Auto-routed (osmAPI picks the best provider)
"model": "gpt-4o"

# Provider-specific
"model": "openai/gpt-4o"
"model": "anthropic/claude-sonnet-4-6"
"model": "google-ai-studio/gemini-2.5-flash"
"model": "groq/llama-3.3-70b-instruct"

Browse all models at app.osmapi.com/models.

Parameters

Parameter	Type	Description
`model`	string	Model ID or `provider/model` format
`messages`	array	Conversation messages (`role` + `content`)
`stream`	boolean	Enable SSE streaming
`temperature`	number	0-2, controls randomness
`max_tokens`	number	Maximum tokens to generate
`top_p`	number	Nucleus sampling
`tools`	array	Function calling tools
`tool_choice`	string/object	Control tool usage
`response_format`	object	Force JSON output
`frequency_penalty`	number	Penalizes repeated tokens. Range: -2.0 to 2.0
`presence_penalty`	number	Penalizes tokens based on presence. Range: -2.0 to 2.0
`reasoning_effort`	string	For reasoning models (minimal/low/medium/high)
`web_search`	boolean	Enable web search for the request
`plugins`	array	Enable plugins like `["response-healing"]`

Features

Vision

Send images for analysis

Tool Calling

Function calling and web search

Reasoning

Advanced reasoning models

Streaming

SSE streaming with response caching

Cost Tracking

Every response includes cost in USD:

{
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 42,
    "total_tokens": 57,
    "cost_usd_total": 0.000285,
    "cost_usd_input": 0.0000225,
    "cost_usd_output": 0.0002625,
    "cost_usd_cached_input": 0.0,
    "cost_usd_request": 0.000285
  }
}

See the Cost Breakdown guide for details.

Chat Completions

Vision

Tool Calling

Reasoning

Streaming

On this page