Cognitive Architecture & Reasoning
harness advanced deductive logic and step-by-step thinking for hyper-complex problem solving.
Deep Deductive Reasoning
osmAPI provides native support for cognitive-tier models capable of exposing their internal, step-by-step thinking processes. This "white-box" reasoning architecture is indispensable for intricate logical deduction, mathematical proofs, and architectural troubleshooting where the final result is as critical as the logic used to derive it.
The Reasoning Effort Matrix
You control the depth of a model's cognitive investment through the reasoning_effort parameter. Selecting the appropriate level balances response speed with deductive depth:
- Minimal (Fast): Optimized for quick logic checks with negligible latency.
- Low (Focused): Lightweight reasoning for straightforward procedural tasks.
- Medium (Standard): The recommended setting for general-purpose problem solving.
- High (Intensive): Deep-dives into multi-step problems requiring thorough validation.
Implementation Mechanics
Orchestrating a Reasoning Request
Incorporate the reasoning_effort attribute into your standard completion request to activate the cognitive layer.
curl -X POST "https://api.osmapi.com/v1/chat/completions" \
-H "Authorization: Bearer $OSM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5",
"messages": [
{
"role": "user",
"content": "Calculate the surface area of a toroid with major radius R=5 and minor radius r=2."
}
],
"reasoning_effort": "medium"
}'Response Schema: Introspection
Reasoning-enabled responses are enriched with a reasoning field within the message payload, providing a window into the model's analytical path.
{
"id": "tx_osm_reasoning_123",
"model": "gpt-5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The surface area is 40π² (approximately 394.78).",
"reasoning": "Formula: SA = (2πr)(2πR) = 4π²rR. Given R=5, r=2: SA = 4π²(2)(5) = 40π². Numerical estimation: 40 * 9.8696 = 394.78."
}
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 50,
"reasoning_tokens": 40,
"total_tokens": 115
}
}Streaming Cognitive Outputs
For interactive applications, reasoning data can be streamed in real-time. This allows users to witness the model's analytical process as it unfolds, improving perceived performance and transparency.
curl -X POST "https://api.osmapi.com/v1/chat/completions" \
-H "Authorization: Bearer $OSM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5",
"reasoning_effort": "high",
"stream": true,
"messages": [{ "role": "user", "content": "Analyze the game theory implications of..." }]
}'The reasoning delta chunks arrive sequentially prior to the finalized content chunks.
Observability & Token Metrics
osmAPI provides granular telemetry for every deductive transaction. The usage object explicitly isolates the financial and computational weight of the thinking process:
reasoning_tokens: Cumulative investment in the thinking phase.completion_tokens: Finalized linguistic output.prompt_tokens: Initial contextual injection.
Strategic Monitoring: Track reasoning_tokens via your administration
console to identify opportunities for model optimization or budgetary
alignment.
Autonomous Reasoning Selection
When utilizing osmAPI's Autonomous Routing (e.g., using a generic model identifier), the engine applies intelligent defaults to maintain efficiency:
- Dynamic Calibration: For standard requests, effort is defaulted to
minimalorlowto conserve resources. - Affinity Matching: If you explicitly define a high reasoning effort, the gateway locks routing only to verified deductive infrastructures.
Strategic Deployment Best Practices
- Context-Aware Scaling: Reserve
higheffort for mission-critical logic; utilizeminimalfor baseline validation to manage costs. - Asynchronous UX: Leverage streaming to keep users engaged during long-running reasoning tasks.
- Audit Trails: Regularly review
reasoningContentin your dashboard logs to calibrate prompt engineering and verify logic accuracy.
Anthropic API Compatibility
Reasoning is also supported through the Anthropic-compatible endpoint at /v1/messages. Use the thinking parameter instead of reasoning_effort:
curl -X POST "https://api.osmapi.com/v1/messages" \
-H "x-api-key: $OSM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-397b-a17b",
"max_tokens": 1000,
"thinking": {"type": "enabled", "budget_tokens": 5000},
"messages": [{"role": "user", "content": "Solve step by step: 15! / 13!"}]
}'The thinking parameter maps to reasoning_effort internally:
budget_tokens≤ 1024 →lowbudget_tokens≤ 8192 →mediumbudget_tokens> 8192 →high
Thinking content is returned as a thinking content block in the response, matching the native Anthropic format. This works with Claude Code, Cline, and other Anthropic SDK clients.
Exception Handling & Unsupported Models
Applying reasoning parameters to legacy or non-perceptive models will trigger a governance error to prevent inaccurate billing or logic failures.
{
"error": {
"message": "Model gpt-4o-mini is not designated for cognitive reasoning. Please adjust the effort parameter or target a reasoning-capable endpoint.",
"type": "governance_exception",
"code": "model_reasoning_unsupported"
}
}Consult our Model Perceptivity Index for a real-time list of compatible infrastructures.
Anthropic Claude Opus 4.5: The effort Parameter
Claude Opus 4.5 (claude-opus-4-5-20251101) uses the effort parameter instead of reasoning_effort. The effort parameter accepts low, medium, or high values:
curl -X POST "https://api.osmapi.com/v1/chat/completions" \
-H "Authorization: Bearer $OSM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-5-20251101",
"messages": [{"role": "user", "content": "Explain the implications of Gödel'\''s incompleteness theorems."}],
"effort": "high"
}'The effort parameter is specific to Claude Opus 4.5. For other reasoning models, use reasoning_effort with values minimal, low, medium, or high.
How is this guide?