API Rate Limiting Policy
Learn how to manage your API usage and choose the right plan for your application.
Key Concept
Requests are rate-limited at both the endpoint and global levels to prevent abuse and ensure reliability for all users.
How Request Throttling Works
Understanding Limits
Delphin's API uses a token-bucket system where requests consume tokens and tokens recharge over time. Each plan has different capacities and refill rates.
Requests exceeding limits receive a 429 Too Many Requests response with retry information.
Enterprise customers receive automatic limit scaling during traffic spikes.
Request Lifecycle
Each request consumes a limit token, tokens regenerate gradually, and buckets overflow when limits are exceeded.
Pricing Tiers & Limits
Starter
Ideal for small projects and testing
- 10,000 requests/day
- 500 requests/minute
- Limited access to premium models
Pro
For growing applications requirements
- 100,000 requests/day
- 2000 requests/minute
- Full model access
- API health dashboard
Enterprise
For large-scale deployments
- Custom daily limits
- Dedicated infrastructure
- 24/7 tech support
- SLA-guaranteed uptime
Request Limits by Endpoint
Endpoint | Rate Limit (RPM) | Burst Limit (RPS) | Bucket Size (RPH) |
---|---|---|---|
/analyze (Standard) | 500 | 2000 | 100,000 |
/analyze (Advanced) | 400 | 1500 | 80,000 |
/predict (Multi-Model) | 300 | 1000 | 60,000 |
/_health | Unlimited | 500 | N/A |
/auth/* | 100 | 500 | 5,000 |
Best Practices
Monitor Usage
The API returns the X-Rate-Limits
header with remaining quota for each request.
Spread Loads
Distribute requests across multiple keys for Enterprise plans to avoid hitting soft limits.