Rate Limits

osmAPI applies rate limits to ensure fair usage and consistent performance for all users. Limits vary based on the model tier you're using.

Free Models

All accounts get the same rate limits for free models:

200 requests per minute across all free models.
Resets every 60 seconds.

Paid models have a default rate limit of 1,000 requests per minute per API key. This limit is significantly higher than free models and can be configured for enterprise accounts. Your throughput may also be limited by your credit balance and the underlying provider's own limits.

Rate Limit Headers

Responses for free model requests include rate limit info in the headers:

X-RateLimit-Limit: 200
X-RateLimit-Remaining: 198

X-RateLimit-Limit: Max requests allowed in the current window.
X-RateLimit-Remaining: Requests remaining before the limit kicks in.

When you hit the limit (429 response), additional headers are included:

X-RateLimit-Reset: Unix timestamp when the limit resets.
Retry-After: Seconds to wait before retrying.

When You Hit the Limit

You'll get a 429 Too Many Requests response:

{
	"error": "Rate limit reached. Please wait for the reset window or use a paid model."
}

Best Practices

Use Exponential Backoff: Retry with increasing delays when you get 429 errors.
Monitor Headers: Check X-RateLimit-Remaining to throttle requests before hitting the limit.
Use Free Models for Dev: Keep free models for development and testing; use paid models for production.

All accounts share the same 200 requests per minute limit for free models, regardless of credit balance.

Rate Limits

Rate Limits

Free Models

Paid Models

Rate Limit Headers

When You Hit the Limit

Best Practices

On this page