API Rate Limiting Policy

Learn how to manage your API usage and choose the right plan for your application.

Key Concept

Requests are rate-limited at both the endpoint and global levels to prevent abuse and ensure reliability for all users.

How Request Throttling Works

Understanding Limits

Delphin's API uses a token-bucket system where requests consume tokens and tokens recharge over time. Each plan has different capacities and refill rates.

Hard Limits

Requests exceeding limits receive a 429 Too Many Requests response with retry information.

Dynamic Scaling

Enterprise customers receive automatic limit scaling during traffic spikes.

Request Lifecycle

Each request consumes a limit token, tokens regenerate gradually, and buckets overflow when limits are exceeded.

Pricing Tiers & Limits

Starter

Ideal for small projects and testing

$9 / month
  • 10,000 requests/day
  • 500 requests/minute
  • Limited access to premium models
Choose Starter

Pro

For growing applications requirements

$49 / month
  • 100,000 requests/day
  • 2000 requests/minute
  • Full model access
  • API health dashboard
Choose Pro

Enterprise

For large-scale deployments

Contact Sales
  • Custom daily limits
  • Dedicated infrastructure
  • 24/7 tech support
  • SLA-guaranteed uptime
Contact for Quote

Request Limits by Endpoint

Endpoint Rate Limit (RPM) Burst Limit (RPS) Bucket Size (RPH)
/analyze (Standard) 500 2000 100,000
/analyze (Advanced) 400 1500 80,000
/predict (Multi-Model) 300 1000 60,000
/_health Unlimited 500 N/A
/auth/* 100 500 5,000

Best Practices

Monitor Usage

The API returns the X-Rate-Limits header with remaining quota for each request.

Spread Loads

Distribute requests across multiple keys for Enterprise plans to avoid hitting soft limits.