Watch API requests get accepted or rejected in real time. Experiment with Token Bucket, Fixed Window, and Sliding Window algorithms.
Rate Limiter Simulator
Token Bucket:Tokens refill at a steady rate (1 token every 800ms). Each request consumes one token. Bucket capacity: 8. Requests are rejected when the bucket is empty — but short bursts are allowed.
Interactive rate limiter simulator. Visualize Token Bucket, Fixed Window, and Sliding Window algorithms. Watch API requests get accepted or rejected in real time.
Accepted:0
Rejected:0
API Clients
Rate Limiter
Tokens8 / 8
API Server
Visual Guide
White packet: Request in transit to the rate limiter.
Green packet: Allowed — forwarded to the API server.
Red packet: Rate limited — dropped with HTTP 429.
Amber dots: Remaining tokens in the bucket.
How to use
1. Click Start Traffic to begin sending API requests. 2. Switch Algorithms to compare Token Bucket vs Window strategies. 3. Increase Traffic Intensity to Burst to trigger rejections. 4. Watch the Accept rate drop — and the tokens empty out.
Quick Guide: Rate Limiting
Understanding the basics in 30 seconds
How It Works
Client sends a request to the API
Rate limiter checks the current token count (or window counter)
If within the limit: request is forwarded to the server
If over the limit: HTTP 429 is returned immediately
Tokens refill (or window resets) over time, restoring capacity
Key Benefits
Protects APIs from DDoS and abusive clients
Ensures fair usage across all consumers
Prevents a single client from starving others
Reduces infrastructure costs during traffic spikes
Enables tiered pricing (free vs paid API limits)
Real-World Uses
Stripe: 100 req/s per API key (Token Bucket)
GitHub API: 5,000 req/hour for authenticated users
Twitter/X API: Rate limits per endpoint per 15-min window
AWS API Gateway: Configurable per-route throttling
NGINX: limit_req_zone directive for web server throttling
Rate Limiting in Production
How real-world APIs protect themselves from abuse, traffic spikes, and runaway clients.
Token Bucket
Tokens refill at a fixed rate and are consumed one per request. An empty bucket means the request is rejected — but bursts up to the full bucket size are always allowed.
Smooth, predictable refill rate
Naturally absorbs short bursts
Used by: Stripe API, AWS API Gateway
Fixed & Sliding Window
Count requests inside a time window. Fixed windows reset sharply — a client can fire double the limit right at the boundary. Sliding windows prevent this with a rolling counter.
Fixed: Simple, low memory — boundary-burst risk
Sliding: Fairer, prevents edge spikes
Used by: GitHub API, Twitter/X API
HTTP 429 — Too Many Requests
When a rate limiter rejects a request, the server responds with HTTP 429 and a Retry-After header. Well-behaved clients implement exponential backoff — doubling the wait time between retries — to avoid a thundering herd where all rejected clients retry simultaneously and immediately.
Rate Limiting Algorithms Explained
Token Bucket
The most widely used algorithm. A bucket holds up to N tokens. One token is consumed per request. Tokens refill at a fixed rate. If the bucket is empty, the request is rejected — but short bursts are naturally absorbed as long as tokens are available.
Example: Stripe API
Bucket size: 100 tokens
Refill rate: 100 tokens / second
A client can burst 100 requests instantly, then sustains 100 req/s
Fixed Window vs Sliding Window
Both count requests inside a time window, but differ at the boundaries:
Fixed Window
Counter resets sharply every N seconds. A client can fire 2× the limit by sending at the end of one window and the start of the next.
Sliding Window
Uses a rolling counter weighted across the current and previous window. Eliminates boundary bursts. Fairer but slightly more memory-intensive.
HTTP 429 & Retry-After
When a request is rate limited, the correct HTTP response is 429 Too Many Requests. The server should include a Retry-After header telling the client when to try again.
Retry-After: 30 — wait 30 seconds before retrying
X-RateLimit-Limit: total allowed requests
X-RateLimit-Remaining: requests left in current window
X-RateLimit-Reset: Unix timestamp when the window resets
💡 Pro Tip: Clients should implement exponential backoff — doubling the wait time on each retry — to avoid a thundering herd effect when limits reset.