Rate Limits
Understand the rate limits for free and paid models on osmAPI.
Rate Limits
osmAPI applies rate limits to ensure fair usage and consistent performance for all users. Limits vary based on the model tier you're using.
Free Models
All accounts get the same rate limits for free models:
- 200 requests per minute across all free models.
- Resets every 60 seconds.
Paid Models
Paid models have a default rate limit of 1,000 requests per minute per API key. This limit is significantly higher than free models and can be configured for enterprise accounts. Your throughput may also be limited by your credit balance and the underlying provider's own limits.
Rate Limit Headers
Responses for free model requests include rate limit info in the headers:
X-RateLimit-Limit: 200
X-RateLimit-Remaining: 198X-RateLimit-Limit: Max requests allowed in the current window.X-RateLimit-Remaining: Requests remaining before the limit kicks in.
When you hit the limit (429 response), additional headers are included:
X-RateLimit-Reset: Unix timestamp when the limit resets.Retry-After: Seconds to wait before retrying.
When You Hit the Limit
You'll get a 429 Too Many Requests response:
{
"error": "Rate limit reached. Please wait for the reset window or use a paid model."
}Best Practices
- Use Exponential Backoff: Retry with increasing delays when you get 429 errors.
- Monitor Headers: Check
X-RateLimit-Remainingto throttle requests before hitting the limit. - Use Free Models for Dev: Keep free models for development and testing; use paid models for production.
All accounts share the same 200 requests per minute limit for free models, regardless of credit balance.
How is this guide?