Rate Limits

To prevent abuse and protect downstream models.

Default Quotas

Tunable in Private

Under Private deployment, global rate limits can be adjusted in .env.

Every response includes rate-limit status:

X-RateLimit-Limit: 20
X-RateLimit-Remaining: 18
X-RateLimit-Reset: 1715154000

Header	Meaning
`X-RateLimit-Limit`	Total quota in the current window
`X-RateLimit-Remaining`	Remaining quota
`X-RateLimit-Reset`	Reset time (Unix seconds)

HTTP/1.1 429 Too Many Requests
Retry-After: 1

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Rate limit exceeded. Retry after 1 second."
  }
}

attempt 1 → immediate
attempt 2 → 1s
attempt 3 → 2s
attempt 4 → 4s
attempt 5 → 8s
give up

Honor Retry-After if present; otherwise, use the defaults above.

If your business truly needs higher concurrency: