New: Audio API, Embeddings & Realtime WebSocket now available!
osmAPI LogoosmAPI

Cognitive Architecture & Reasoning

harness advanced deductive logic and step-by-step thinking for hyper-complex problem solving.

Deep Deductive Reasoning

osmAPI provides native support for cognitive-tier models capable of exposing their internal, step-by-step thinking processes. This "white-box" reasoning architecture is indispensable for intricate logical deduction, mathematical proofs, and architectural troubleshooting where the final result is as critical as the logic used to derive it.

The Reasoning Effort Matrix

You control the depth of a model's cognitive investment through the reasoning_effort parameter. Selecting the appropriate level balances response speed with deductive depth:

  • Minimal (Fast): Optimized for quick logic checks with negligible latency.
  • Low (Focused): Lightweight reasoning for straightforward procedural tasks.
  • Medium (Standard): The recommended setting for general-purpose problem solving.
  • High (Intensive): Deep-dives into multi-step problems requiring thorough validation.

Implementation Mechanics

Orchestrating a Reasoning Request

Incorporate the reasoning_effort attribute into your standard completion request to activate the cognitive layer.

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5",
    "messages": [
      {
        "role": "user",
        "content": "Calculate the surface area of a toroid with major radius R=5 and minor radius r=2."
      }
    ],
    "reasoning_effort": "medium"
  }'

Response Schema: Introspection

Reasoning-enabled responses are enriched with a reasoning field within the message payload, providing a window into the model's analytical path.

{
	"id": "tx_osm_reasoning_123",
	"model": "gpt-5",
	"choices": [
		{
			"index": 0,
			"message": {
				"role": "assistant",
				"content": "The surface area is 40π² (approximately 394.78).",
				"reasoning": "Formula: SA = (2πr)(2πR) = 4π²rR. Given R=5, r=2: SA = 4π²(2)(5) = 40π². Numerical estimation: 40 * 9.8696 = 394.78."
			}
		}
	],
	"usage": {
		"prompt_tokens": 25,
		"completion_tokens": 50,
		"reasoning_tokens": 40,
		"total_tokens": 115
	}
}

Streaming Cognitive Outputs

For interactive applications, reasoning data can be streamed in real-time. This allows users to witness the model's analytical process as it unfolds, improving perceived performance and transparency.

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5",
    "reasoning_effort": "high",
    "stream": true,
    "messages": [{ "role": "user", "content": "Analyze the game theory implications of..." }]
  }'

The reasoning delta chunks arrive sequentially prior to the finalized content chunks.


Observability & Token Metrics

osmAPI provides granular telemetry for every deductive transaction. The usage object explicitly isolates the financial and computational weight of the thinking process:

  • reasoning_tokens: Cumulative investment in the thinking phase.
  • completion_tokens: Finalized linguistic output.
  • prompt_tokens: Initial contextual injection.

Strategic Monitoring: Track reasoning_tokens via your administration console to identify opportunities for model optimization or budgetary alignment.


Autonomous Reasoning Selection

When utilizing osmAPI's Autonomous Routing (e.g., using a generic model identifier), the engine applies intelligent defaults to maintain efficiency:

  1. Dynamic Calibration: For standard requests, effort is defaulted to minimal or low to conserve resources.
  2. Affinity Matching: If you explicitly define a high reasoning effort, the gateway locks routing only to verified deductive infrastructures.

Strategic Deployment Best Practices

  • Context-Aware Scaling: Reserve high effort for mission-critical logic; utilize minimal for baseline validation to manage costs.
  • Asynchronous UX: Leverage streaming to keep users engaged during long-running reasoning tasks.
  • Audit Trails: Regularly review reasoningContent in your dashboard logs to calibrate prompt engineering and verify logic accuracy.

Anthropic API Compatibility

Reasoning is also supported through the Anthropic-compatible endpoint at /v1/messages. Use the thinking parameter instead of reasoning_effort:

curl -X POST "https://api.osmapi.com/v1/messages" \
  -H "x-api-key: $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5-397b-a17b",
    "max_tokens": 1000,
    "thinking": {"type": "enabled", "budget_tokens": 5000},
    "messages": [{"role": "user", "content": "Solve step by step: 15! / 13!"}]
  }'

The thinking parameter maps to reasoning_effort internally:

  • budget_tokens ≤ 1024 → low
  • budget_tokens ≤ 8192 → medium
  • budget_tokens > 8192 → high

Thinking content is returned as a thinking content block in the response, matching the native Anthropic format. This works with Claude Code, Cline, and other Anthropic SDK clients.


Exception Handling & Unsupported Models

Applying reasoning parameters to legacy or non-perceptive models will trigger a governance error to prevent inaccurate billing or logic failures.

{
	"error": {
		"message": "Model gpt-4o-mini is not designated for cognitive reasoning. Please adjust the effort parameter or target a reasoning-capable endpoint.",
		"type": "governance_exception",
		"code": "model_reasoning_unsupported"
	}
}

Consult our Model Perceptivity Index for a real-time list of compatible infrastructures.


Anthropic Claude Opus 4.5: The effort Parameter

Claude Opus 4.5 (claude-opus-4-5-20251101) uses the effort parameter instead of reasoning_effort. The effort parameter accepts low, medium, or high values:

curl -X POST "https://api.osmapi.com/v1/chat/completions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-5-20251101",
    "messages": [{"role": "user", "content": "Explain the implications of Gödel'\''s incompleteness theorems."}],
    "effort": "high"
  }'

The effort parameter is specific to Claude Opus 4.5. For other reasoning models, use reasoning_effort with values minimal, low, medium, or high.

How is this guide?