Explore how different rate limiting strategies work through interactive visualizations. Click, experiment, and build intuition for each algorithm.
Rate limiting is a critical technique used to control the rate of requests that clients can make to an API or service. It protects your systems from being overwhelmed, ensures fair usage among clients, and prevents abuse.
Prevent DDoS attacks and resource exhaustion
Ensure equal access for all clients
Manage infrastructure costs effectively
The most flexible rate limiter — allows controlled bursts
Token Bucket is widely used (e.g., by Amazon and Stripe) because it smooths out traffic while still allowing short bursts. If a client hasn't made requests in a while, they can "spend" their accumulated tokens in a burst.
import time
class TokenBucket:
def __init__(self, capacity: int, refill_rate: float):
self.capacity = capacity
self.refill_rate = refill_rate
self.tokens = float(capacity)
self.last_refill = time.time()
def allow_request(self) -> bool:
now = time.time()
elapsed = now - self.last_refill
# Refill tokens based on elapsed time
self.tokens = min(
self.capacity,
self.tokens + elapsed * self.refill_rate
)
self.last_refill = now
if self.tokens >= 1.0:
self.tokens -= 1.0
return True
return False
Smooths out bursty traffic — processes requests at a constant rate
Unlike Token Bucket, Leaking Bucket does not allow bursts — it always processes at a constant rate. Think of it like water dripping from a leaky bucket: no matter how fast you pour water in, it drips out at the same pace.
import time
from collections import deque
class LeakingBucket:
def __init__(self, capacity: int, leak_rate: float):
self.capacity = capacity
self.leak_rate = leak_rate # requests per second
self.queue = deque()
self.last_leak = time.time()
def _leak(self):
now = time.time()
elapsed = now - self.last_leak
leaked = int(elapsed * self.leak_rate)
if leaked > 0:
# Process (leak) requests from queue
for _ in range(min(leaked, len(self.queue))):
self.queue.popleft()
self.last_leak = now
def add_request(self) -> bool:
self._leak()
if len(self.queue) < self.capacity:
self.queue.append(time.time())
return True
return False
Simple and memory-efficient — counts requests in fixed time windows
The main weakness: a burst of requests at the end of one window and the start of the next can effectively allow 2x the rate limit. Watch the visualization to see this in action!
import time
class FixedWindowCounter:
def __init__(self, limit: int, window_size: float):
self.limit = limit
self.window_size = window_size
self.count = 0
self.window_start = time.time()
def allow_request(self) -> bool:
now = time.time()
if now - self.window_start >= self.window_size:
# Reset the counter and slide window start
self.count = 0
self.window_start = now
if self.count < self.limit:
self.count += 1
return True
return False
Precise rate limiting — tracks every request timestamp
This solves the Fixed Window boundary problem but requires more memory — it stores every request timestamp. The trade-off is precision vs. memory usage.
import time
class SlidingWindowLog:
def __init__(self, limit: int, window_size: float):
self.limit = limit
self.window_size = window_size
self.timestamps = []
def allow_request(self) -> bool:
now = time.time()
cutoff = now - self.window_size
# Filter out expired timestamps
self.timestamps = [t for t in self.timestamps if t >= cutoff]
if len(self.timestamps) < self.limit:
self.timestamps.append(now)
return True
return False
Best of both worlds — combines Fixed Window efficiency with Sliding Window accuracy
weighted = prev_count × overlap% + current_countThis is the sweet spot for most use cases. You only store two counters instead of every timestamp, yet get much better precision than fixed windows. Used by Cloudflare and many production systems.
import time
class SlidingWindowCounter:
def __init__(self, limit: int, window_size: float):
self.limit = limit
self.window_size = window_size
self.prev_count = 0
self.current_count = 0
self.window_start = time.time()
def allow_request(self) -> bool:
now = time.time()
elapsed = now - self.window_start
if elapsed >= self.window_size:
# Shift window counters
self.prev_count = self.current_count
self.current_count = 0
self.window_start = now
elapsed = 0.0
# Estimate requests based on overlap weight
overlap = 1.0 - (elapsed / self.window_size)
weighted_count = self.prev_count * overlap + self.current_count
if weighted_count < self.limit:
self.current_count += 1
return True
return False
How production APIs communicate rate limits to clients
Specifies the maximum number of requests allowed within the current rate limiting window.
Indicates the number of requests remaining in the current window before a block occurs.
Specifies the Unix epoch timestamp indicating exactly when the current window resets and counters clear.
Sent during a block. Tells the client exactly how many seconds to wait before trying the request again.
When a client exceeds their rate limit, servers reject the connection with an HTTP Status Code 429 Too Many Requests. Standardized under RFC 6585, clients should back off and observe the Retry-After header to prevent getting banned.
Understanding the trade-offs to choose the right algorithm
| Feature | Token Bucket | Leaking Bucket | Fixed Window | Sliding Log | Sliding Counter |
|---|---|---|---|---|---|
| Memory Usage | Low | Low | Low | High | Low |
| Accuracy | High | High | Medium | Excellent | High |
| Burst Handling | Allows Bursts | Smooths Out | Boundary Issue | Precise | Good |
| Implementation | Simple | Simple | Simplest | Complex | Moderate |
| Used By | Amazon, Stripe | Network Routers | Simple APIs | Financial Systems | Cloudflare |
| Best For | APIs with burst tolerance | Steady output needed | Simple use cases | Strict compliance | Most production systems |
Use Token Bucket
Use Leaking Bucket
Use Fixed Window Counter
Use Sliding Window Log
Use Sliding Window Counter
Got questions about rate limiting? We've got answers.
Rate limiting generally refers to hard limits placed on requests (e.g., maximum 100 requests per minute), where any requests above the limit are rejected immediately with an HTTP 429 error.
Throttling is the process of slowing down request execution or limiting bandwidth/processing rates dynamically rather than failing outright. Leaking Bucket is a good example of a throttling rate limiter.
In distributed environments with multiple server instances, rate limiter state must be shared. Redis is ideal because:
INCR and Lua scripts) to prevent race conditions.EXPIRE, which makes implementing Fixed Window rate limiters straightforward.The Fixed Window Counter algorithm suffers from a boundary problem where a client can make double their limit at the edge of window resets. For instance, if the limit is 10/min, they can send 10 requests at 0:59 and another 10 at 1:01, resulting in 20 requests in 2 seconds.
The Sliding Window Counter solves this by taking a weighted average of the request count from both the previous and current window, smoothing out the rate calculation without storing all request timestamps like a Sliding Log would.
While headers like X-RateLimit-* are widely used by services like GitHub and Stripe, they are not standard RFCs. However, the IETF is working on standardizing rate limit headers under RFC 9126, suggesting response headers like RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset (replacing the "X-" prefix).