Interactive Visual Guide

Understanding
Rate Limiting
Algorithms

Explore how different rate limiting strategies work through interactive visualizations. Click, experiment, and build intuition for each algorithm.

5 Algorithms
Interactive
0 Prerequisites
Scroll to explore

Why Rate Limiting?

Rate limiting is a critical technique used to control the rate of requests that clients can make to an API or service. It protects your systems from being overwhelmed, ensures fair usage among clients, and prevents abuse.

🛡️

Protection

Prevent DDoS attacks and resource exhaustion

⚖️

Fairness

Ensure equal access for all clients

💰

Cost Control

Manage infrastructure costs effectively

API Server
Healthy
01

Token Bucket

The most flexible rate limiter — allows controlled bursts

How it works

  1. A bucket holds tokens up to a maximum capacity
  2. Tokens are added at a constant refill rate
  3. Each request consumes one token
  4. If no tokens available, the request is rejected
  5. Allows bursty traffic when tokens have accumulated

💡 Key Insight

Token Bucket is widely used (e.g., by Amazon and Stripe) because it smooths out traffic while still allowing short bursts. If a client hasn't made requests in a while, they can "spend" their accumulated tokens in a burst.

Parameters

5
1
🐍 View Python Implementation
import time

class TokenBucket:
    def __init__(self, capacity: int, refill_rate: float):
        self.capacity = capacity
        self.refill_rate = refill_rate
        self.tokens = float(capacity)
        self.last_refill = time.time()

    def allow_request(self) -> bool:
        now = time.time()
        elapsed = now - self.last_refill
        
        # Refill tokens based on elapsed time
        self.tokens = min(
            self.capacity,
            self.tokens + elapsed * self.refill_rate
        )
        self.last_refill = now
        
        if self.tokens >= 1.0:
            self.tokens -= 1.0
            return True
        return False
Token Bucket
0 / 5
💧 Refilling...

Request Log

02

Leaking Bucket

Smooths out bursty traffic — processes requests at a constant rate

How it works

  1. Requests are added to a queue (bucket)
  2. The queue has a fixed size; if full, new requests are dropped
  3. Requests are processed (leaked) from the queue at a constant rate
  4. This ensures a steady output rate regardless of input burstiness

💡 Key Insight

Unlike Token Bucket, Leaking Bucket does not allow bursts — it always processes at a constant rate. Think of it like water dripping from a leaky bucket: no matter how fast you pour water in, it drips out at the same pace.

Parameters

5
1
🐍 View Python Implementation
import time
from collections import deque

class LeakingBucket:
    def __init__(self, capacity: int, leak_rate: float):
        self.capacity = capacity
        self.leak_rate = leak_rate # requests per second
        self.queue = deque()
        self.last_leak = time.time()

    def _leak(self):
        now = time.time()
        elapsed = now - self.last_leak
        leaked = int(elapsed * self.leak_rate)
        
        if leaked > 0:
            # Process (leak) requests from queue
            for _ in range(min(leaked, len(self.queue))):
                self.queue.popleft()
            self.last_leak = now

    def add_request(self) -> bool:
        self._leak()
        if len(self.queue) < self.capacity:
            self.queue.append(time.time())
            return True
        return False
Leaking Bucket Queue
0 / 5
Processing at constant rate
✅ Processed
0

Request Log

03

Fixed Window Counter

Simple and memory-efficient — counts requests in fixed time windows

How it works

  1. Time is divided into fixed windows (e.g., every 10 seconds)
  2. Each window has a counter that starts at zero
  3. Each request increments the counter
  4. If the counter exceeds the limit, requests are rejected
  5. When a new window begins, the counter resets to zero

⚠️ The Boundary Problem

The main weakness: a burst of requests at the end of one window and the start of the next can effectively allow 2x the rate limit. Watch the visualization to see this in action!

Parameters

10
5
🐍 View Python Implementation
import time

class FixedWindowCounter:
    def __init__(self, limit: int, window_size: float):
        self.limit = limit
        self.window_size = window_size
        self.count = 0
        self.window_start = time.time()

    def allow_request(self) -> bool:
        now = time.time()
        
        if now - self.window_start >= self.window_size:
            # Reset the counter and slide window start
            self.count = 0
            self.window_start = now
            
        if self.count < self.limit:
            self.count += 1
            return True
        return False
Current Window 10s
0 / 5

Window History

Request Log

04

Sliding Window Log

Precise rate limiting — tracks every request timestamp

How it works

  1. Keeps a log of timestamps for each request
  2. When a new request arrives, remove outdated entries (older than window size)
  3. Count remaining entries — if count < limit, allow and add timestamp
  4. If count ≥ limit, reject the request
  5. The window slides continuously — no boundary issues!

💡 Key Insight

This solves the Fixed Window boundary problem but requires more memory — it stores every request timestamp. The trade-off is precision vs. memory usage.

Parameters

10
5
🐍 View Python Implementation
import time

class SlidingWindowLog:
    def __init__(self, limit: int, window_size: float):
        self.limit = limit
        self.window_size = window_size
        self.timestamps = []

    def allow_request(self) -> bool:
        now = time.time()
        cutoff = now - self.window_size
        
        # Filter out expired timestamps
        self.timestamps = [t for t in self.timestamps if t >= cutoff]
        
        if len(self.timestamps) < self.limit:
            self.timestamps.append(now)
            return True
        return False
Requests in window: 0 / 5
Stored timestamps: 0

Timestamp Log

Request Log

05

Sliding Window Counter

Best of both worlds — combines Fixed Window efficiency with Sliding Window accuracy

How it works

  1. Uses two adjacent fixed windows (previous + current)
  2. Calculates a weighted count based on the overlap
  3. Formula: weighted = prev_count × overlap% + current_count
  4. If weighted count < limit → allow, else → reject
  5. Achieves sliding window accuracy with minimal memory

💡 Key Insight

This is the sweet spot for most use cases. You only store two counters instead of every timestamp, yet get much better precision than fixed windows. Used by Cloudflare and many production systems.

Parameters

10
5
🐍 View Python Implementation
import time

class SlidingWindowCounter:
    def __init__(self, limit: int, window_size: float):
        self.limit = limit
        self.window_size = window_size
        self.prev_count = 0
        self.current_count = 0
        self.window_start = time.time()

    def allow_request(self) -> bool:
        now = time.time()
        elapsed = now - self.window_start
        
        if elapsed >= self.window_size:
            # Shift window counters
            self.prev_count = self.current_count
            self.current_count = 0
            self.window_start = now
            elapsed = 0.0
            
        # Estimate requests based on overlap weight
        overlap = 1.0 - (elapsed / self.window_size)
        weighted_count = self.prev_count * overlap + self.current_count
        
        if weighted_count < self.limit:
            self.current_count += 1
            return True
        return False
Current Window
0
10s
Weighted Count = 0 × 100% + 0 = 0
Status: ✅ Under limit (0 / 5)

Request Log

HTTP Response Standards

How production APIs communicate rate limits to clients

X-RateLimit-Limit Standard

Specifies the maximum number of requests allowed within the current rate limiting window.

X-RateLimit-Limit: 100
X-RateLimit-Remaining Standard

Indicates the number of requests remaining in the current window before a block occurs.

X-RateLimit-Remaining: 42
X-RateLimit-Reset Standard

Specifies the Unix epoch timestamp indicating exactly when the current window resets and counters clear.

X-RateLimit-Reset: 1719747600
Retry-After Block Header

Sent during a block. Tells the client exactly how many seconds to wait before trying the request again.

Retry-After: 30
🚨

HTTP Status Code 429: Too Many Requests

When a client exceeds their rate limit, servers reject the connection with an HTTP Status Code 429 Too Many Requests. Standardized under RFC 6585, clients should back off and observe the Retry-After header to prevent getting banned.

Algorithm Comparison

Understanding the trade-offs to choose the right algorithm

Feature Token Bucket Leaking Bucket Fixed Window Sliding Log Sliding Counter
Memory Usage Low Low Low High Low
Accuracy High High Medium Excellent High
Burst Handling Allows Bursts Smooths Out Boundary Issue Precise Good
Implementation Simple Simple Simplest Complex Moderate
Used By Amazon, Stripe Network Routers Simple APIs Financial Systems Cloudflare
Best For APIs with burst tolerance Steady output needed Simple use cases Strict compliance Most production systems

Which algorithm should you use?

🏎️

Need burst tolerance?

Use Token Bucket

🚰

Need steady output?

Use Leaking Bucket

🎯

Need simplicity?

Use Fixed Window Counter

🔬

Need precision?

Use Sliding Window Log

Need balance?

Use Sliding Window Counter

Frequently Asked Questions

Got questions about rate limiting? We've got answers.

What is the difference between rate limiting and throttling?

Rate limiting generally refers to hard limits placed on requests (e.g., maximum 100 requests per minute), where any requests above the limit are rejected immediately with an HTTP 429 error.

Throttling is the process of slowing down request execution or limiting bandwidth/processing rates dynamically rather than failing outright. Leaking Bucket is a good example of a throttling rate limiter.

Why is Redis commonly used for distributed rate limiters?

In distributed environments with multiple server instances, rate limiter state must be shared. Redis is ideal because:

  • It stores state in-memory for microsecond operations.
  • It supports atomic operations (like INCR and Lua scripts) to prevent race conditions.
  • It supports expiring keys via EXPIRE, which makes implementing Fixed Window rate limiters straightforward.
How does the Sliding Window Counter algorithm improve on the Fixed Window approach?

The Fixed Window Counter algorithm suffers from a boundary problem where a client can make double their limit at the edge of window resets. For instance, if the limit is 10/min, they can send 10 requests at 0:59 and another 10 at 1:01, resulting in 20 requests in 2 seconds.

The Sliding Window Counter solves this by taking a weighted average of the request count from both the previous and current window, smoothing out the rate calculation without storing all request timestamps like a Sliding Log would.

Is there an RFC standard for rate limiting headers?

While headers like X-RateLimit-* are widely used by services like GitHub and Stripe, they are not standard RFCs. However, the IETF is working on standardizing rate limit headers under RFC 9126, suggesting response headers like RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset (replacing the "X-" prefix).