New: Audio API, Embeddings & Realtime WebSocket now available!
osmAPI LogoosmAPI

Smart Routing

osmAPI automatically routes your requests to the best model and provider.

Smart Routing

osmAPI routes your AI requests to the best available model and provider based on performance, reliability, and cost. You can let osmAPI choose automatically or specify exactly which model and provider to use.

Auto Mode

Use "model": "auto" to let osmAPI pick the best model for your request:

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Summarize this dataset..."}]
  }'

How Auto Mode Works

  • Cost-Efficient by Default: Uses fast, affordable models for routine tasks.
  • Auto-Upgrades: Switches to larger models when your input exceeds context limits.
  • Reasoning Detection: Identifies complex requests and routes them to reasoning-capable models.

Auto Mode Options

Free Models Only

Restrict auto mode to free-tier models — great for development and testing:

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Simple validation check."}],
    "free_models_only": true
  }'

Adding a small credit balance (e.g., ₹500) significantly increases your rate limits for free models.

Reasoning Mode

Force selection of reasoning-capable models for complex tasks:

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Solve this multi-step problem."}],
    "reasoning_effort": "high"
  }'

Fast Mode (No Reasoning)

Skip reasoning models when speed matters most:

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Quick chat response."}],
    "no_reasoning": true
  }'

Specific Models

By Model Name

Use any model by its ID. See the full list at the Model Catalog or via the /v1/models endpoint.

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

When you specify a model name without a provider, osmAPI picks the best provider automatically using a scoring system based on:

  • Availability (50%): Providers with higher uptime score better.
  • Speed (20%): Faster token generation rates are preferred.
  • Cost (20%): Lower cost-per-token providers rank higher.
  • Latency (10%): Lower time-to-first-token is favored for streaming.

Providers below 95% availability are heavily penalized. osmAPI also sends ~1% of traffic to less-used providers to keep performance data fresh.

By Provider + Model

Pin requests to a specific provider:

# Use OpenAI directly
curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Automatic Failover

If your chosen provider drops below 90% availability, osmAPI automatically fails over to the next best provider for that model. This keeps your app running even during provider outages.

Failover only works if another provider hosts the same model. If no alternative exists, the request goes to the original provider.

Disable Failover

Use the X-No-Fallback header to force requests to a specific provider with no failover:

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-No-Fallback: true" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "No fallback allowed."}]
  }'

Disabling failover increases the risk of failures if the provider is having issues. Only use this when provider-specific behavior is required.


Best Practices

  • Development: Use specific models for consistent, reproducible results.
  • Production: Use auto mode to benefit from automatic failover and optimization.
  • Monitoring: Check your dashboard regularly to see which providers perform best for your use cases.
  • Reliability: Add API keys for multiple providers to maximize routing options.

How is this guide?