Interactive Visual Guide

Understanding
Rate Limiting
Algorithms

Explore how different rate limiting strategies work through interactive visualizations. Click, experiment, and build intuition for each algorithm.

5 Algorithms
Interactive
0 Prerequisites
Scroll to explore

Why Rate Limiting?

Rate limiting is a critical technique used to control the rate of requests that clients can make to an API or service. It protects your systems from being overwhelmed, ensures fair usage among clients, and prevents abuse.

🛡️

Protection

Prevent DDoS attacks and resource exhaustion

⚖️

Fairness

Ensure equal access for all clients

💰

Cost Control

Manage infrastructure costs effectively

API Server
Healthy
01

Token Bucket

The most flexible rate limiter — allows controlled bursts

How it works

  1. A bucket holds tokens up to a maximum capacity
  2. Tokens are added at a constant refill rate
  3. Each request consumes one token
  4. If no tokens available, the request is rejected
  5. Allows bursty traffic when tokens have accumulated

💡 Key Insight

Token Bucket is widely used (e.g., by Amazon and Stripe) because it smooths out traffic while still allowing short bursts. If a client hasn't made requests in a while, they can "spend" their accumulated tokens in a burst.

Parameters

5
1
Token Bucket
0 / 5
💧 Refilling...

Request Log

02

Leaking Bucket

Smooths out bursty traffic — processes requests at a constant rate

How it works

  1. Requests are added to a queue (bucket)
  2. The queue has a fixed size; if full, new requests are dropped
  3. Requests are processed (leaked) from the queue at a constant rate
  4. This ensures a steady output rate regardless of input burstiness

💡 Key Insight

Unlike Token Bucket, Leaking Bucket does not allow bursts — it always processes at a constant rate. Think of it like water dripping from a leaky bucket: no matter how fast you pour water in, it drips out at the same pace.

Parameters

5
1
Leaking Bucket Queue
0 / 5
Processing at constant rate
✅ Processed
0

Request Log

03

Fixed Window Counter

Simple and memory-efficient — counts requests in fixed time windows

How it works

  1. Time is divided into fixed windows (e.g., every 10 seconds)
  2. Each window has a counter that starts at zero
  3. Each request increments the counter
  4. If the counter exceeds the limit, requests are rejected
  5. When a new window begins, the counter resets to zero

⚠️ The Boundary Problem

The main weakness: a burst of requests at the end of one window and the start of the next can effectively allow 2x the rate limit. Watch the visualization to see this in action!

Parameters

10
5
Current Window 10s
0 / 5

Window History

Request Log

04

Sliding Window Log

Precise rate limiting — tracks every request timestamp

How it works

  1. Keeps a log of timestamps for each request
  2. When a new request arrives, remove outdated entries (older than window size)
  3. Count remaining entries — if count < limit, allow and add timestamp
  4. If count ≥ limit, reject the request
  5. The window slides continuously — no boundary issues!

💡 Key Insight

This solves the Fixed Window boundary problem but requires more memory — it stores every request timestamp. The trade-off is precision vs. memory usage.

Parameters

10
5
Requests in window: 0 / 5
Stored timestamps: 0

Timestamp Log

Request Log

05

Sliding Window Counter

Best of both worlds — combines Fixed Window efficiency with Sliding Window accuracy

How it works

  1. Uses two adjacent fixed windows (previous + current)
  2. Calculates a weighted count based on the overlap
  3. Formula: weighted = prev_count × overlap% + current_count
  4. If weighted count < limit → allow, else → reject
  5. Achieves sliding window accuracy with minimal memory

💡 Key Insight

This is the sweet spot for most use cases. You only store two counters instead of every timestamp, yet get much better precision than fixed windows. Used by Cloudflare and many production systems.

Parameters

10
5
Current Window
0
10s
Weighted Count = 0 × 100% + 0 = 0
Status: ✅ Under limit (0 / 5)

Request Log

Algorithm Comparison

Understanding the trade-offs to choose the right algorithm

Feature Token Bucket Leaking Bucket Fixed Window Sliding Log Sliding Counter
Memory Usage Low Low Low High Low
Accuracy High High Medium Excellent High
Burst Handling Allows Bursts Smooths Out Boundary Issue Precise Good
Implementation Simple Simple Simplest Complex Moderate
Used By Amazon, Stripe Network Routers Simple APIs Financial Systems Cloudflare
Best For APIs with burst tolerance Steady output needed Simple use cases Strict compliance Most production systems

Which algorithm should you use?

🏎️

Need burst tolerance?

Use Token Bucket

🚰

Need steady output?

Use Leaking Bucket

🎯

Need simplicity?

Use Fixed Window Counter

🔬

Need precision?

Use Sliding Window Log

Need balance?

Use Sliding Window Counter