API Rate Limiting Policy

Learn how to manage your API usage and choose the right plan for your application.

Key Concept

Requests are rate-limited at both the endpoint and global levels to prevent abuse and ensure reliability for all users.

How Request Throttling Works

Understanding Limits

Delphin's API uses a token-bucket system where requests consume tokens and tokens recharge over time. Each plan has different capacities and refill rates.

Hard Limits

Requests exceeding limits receive a 429 Too Many Requests response with retry information.

Dynamic Scaling

Enterprise customers receive automatic limit scaling during traffic spikes.

Request Lifecycle

Each request consumes a limit token, tokens regenerate gradually, and buckets overflow when limits are exceeded.

Pricing Tiers & Limits

Starter

Ideal for small projects and testing

$9 / month

10,000 requests/day
500 requests/minute
Limited access to premium models

Choose Starter

Pro

For growing applications requirements

$49 / month

100,000 requests/day
2000 requests/minute
Full model access
API health dashboard

Choose Pro

Enterprise

For large-scale deployments

Contact Sales

Custom daily limits
Dedicated infrastructure
24/7 tech support
SLA-guaranteed uptime

Contact for Quote

Request Limits by Endpoint

Endpoint	Rate Limit (RPM)	Burst Limit (RPS)	Bucket Size (RPH)
/analyze (Standard)	500	2000	100,000
/analyze (Advanced)	400	1500	80,000
/predict (Multi-Model)	300	1000	60,000
/_health	Unlimited	500	N/A
/auth/*	100	500	5,000

Best Practices

Monitor Usage

The API returns the X-Rate-Limits header with remaining quota for each request.

Spread Loads

Distribute requests across multiple keys for Enterprise plans to avoid hitting soft limits.