- Refactor Redis backend connection handling and pool management - Update algorithm implementations with improved type annotations - Enhance config loader validation with stricter Pydantic schemas - Improve decorator and middleware error handling - Expand example scripts with better docstrings and usage patterns - Add new 00_basic_usage.py example for quick start - Reorganize examples directory structure - Fix type annotation inconsistencies across core modules - Update dependencies in pyproject.toml
291 lines
8.0 KiB
ReStructuredText
291 lines
8.0 KiB
ReStructuredText
Rate Limiting Algorithms
|
|
========================
|
|
|
|
FastAPI Traffic ships with five rate limiting algorithms. Each has its own strengths,
|
|
and picking the right one depends on what you're trying to achieve.
|
|
|
|
This guide will help you understand the tradeoffs and choose wisely.
|
|
|
|
Overview
|
|
--------
|
|
|
|
Here's the quick comparison:
|
|
|
|
.. list-table::
|
|
:header-rows: 1
|
|
:widths: 20 40 40
|
|
|
|
* - Algorithm
|
|
- Best For
|
|
- Tradeoffs
|
|
* - **Token Bucket**
|
|
- APIs that need burst handling
|
|
- Allows temporary spikes above average rate
|
|
* - **Sliding Window**
|
|
- Precise rate limiting
|
|
- Higher memory usage
|
|
* - **Fixed Window**
|
|
- Simple, low-overhead limiting
|
|
- Boundary issues (2x burst at window edges)
|
|
* - **Leaky Bucket**
|
|
- Consistent throughput
|
|
- No burst handling
|
|
* - **Sliding Window Counter**
|
|
- General purpose (default)
|
|
- Good balance of precision and efficiency
|
|
|
|
Token Bucket
|
|
------------
|
|
|
|
Think of this as a bucket that holds tokens. Each request consumes a token, and
|
|
tokens refill at a steady rate. If the bucket is empty, requests are rejected.
|
|
|
|
.. code-block:: python
|
|
|
|
from fastapi_traffic import rate_limit, Algorithm
|
|
|
|
@app.get("/api/data")
|
|
@rate_limit(
|
|
100, # 100 tokens refill per minute
|
|
60,
|
|
algorithm=Algorithm.TOKEN_BUCKET,
|
|
burst_size=20, # bucket can hold up to 20 tokens
|
|
)
|
|
async def get_data(request: Request):
|
|
return {"data": "here"}
|
|
|
|
**How it works:**
|
|
|
|
1. The bucket starts full (at ``burst_size`` capacity)
|
|
2. Each request removes one token
|
|
3. Tokens refill at ``limit / window_size`` per second
|
|
4. If no tokens are available, the request is rejected
|
|
|
|
**When to use it:**
|
|
|
|
- Your API has legitimate burst traffic (e.g., page loads that trigger multiple requests)
|
|
- You want to allow short spikes while maintaining an average rate
|
|
- Mobile apps that batch requests when coming online
|
|
|
|
**Example scenario:** A mobile app that syncs data when it reconnects. You want to
|
|
allow it to catch up quickly, but not overwhelm your servers.
|
|
|
|
Sliding Window
|
|
--------------
|
|
|
|
This algorithm tracks the exact timestamp of every request within the window. It's
|
|
the most accurate approach, but uses more memory.
|
|
|
|
.. code-block:: python
|
|
|
|
@app.get("/api/transactions")
|
|
@rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW)
|
|
async def get_transactions(request: Request):
|
|
return {"transactions": []}
|
|
|
|
**How it works:**
|
|
|
|
1. Every request timestamp is stored
|
|
2. When checking, we count requests in the last ``window_size`` seconds
|
|
3. Old timestamps are cleaned up automatically
|
|
|
|
**When to use it:**
|
|
|
|
- You need precise rate limiting (financial APIs, compliance requirements)
|
|
- Memory isn't a major concern
|
|
- The rate limit is relatively low (not millions of requests)
|
|
|
|
**Tradeoffs:**
|
|
|
|
- Memory usage grows with request volume
|
|
- Slightly more CPU for timestamp management
|
|
|
|
Fixed Window
|
|
------------
|
|
|
|
The simplest algorithm. Divide time into fixed windows (e.g., every minute) and
|
|
count requests in each window.
|
|
|
|
.. code-block:: python
|
|
|
|
@app.get("/api/simple")
|
|
@rate_limit(100, 60, algorithm=Algorithm.FIXED_WINDOW)
|
|
async def simple_endpoint(request: Request):
|
|
return {"status": "ok"}
|
|
|
|
**How it works:**
|
|
|
|
1. Time is divided into fixed windows (0:00-1:00, 1:00-2:00, etc.)
|
|
2. Each request increments the counter for the current window
|
|
3. When the window changes, the counter resets
|
|
|
|
**When to use it:**
|
|
|
|
- You want the simplest, most efficient option
|
|
- Slight inaccuracy at window boundaries is acceptable
|
|
- High-volume scenarios where memory matters
|
|
|
|
**The boundary problem:**
|
|
|
|
A client could make 100 requests at 0:59 and another 100 at 1:01, effectively
|
|
getting 200 requests in 2 seconds. If this matters for your use case, use
|
|
sliding window counter instead.
|
|
|
|
Leaky Bucket
|
|
------------
|
|
|
|
Imagine a bucket with a hole in the bottom. Requests fill the bucket, and it
|
|
"leaks" at a constant rate. If the bucket overflows, requests are rejected.
|
|
|
|
.. code-block:: python
|
|
|
|
@app.get("/api/steady")
|
|
@rate_limit(
|
|
100,
|
|
60,
|
|
algorithm=Algorithm.LEAKY_BUCKET,
|
|
burst_size=10, # bucket capacity
|
|
)
|
|
async def steady_endpoint(request: Request):
|
|
return {"status": "ok"}
|
|
|
|
**How it works:**
|
|
|
|
1. The bucket has a maximum capacity (``burst_size``)
|
|
2. Each request adds "water" to the bucket
|
|
3. Water leaks out at ``limit / window_size`` per second
|
|
4. If the bucket would overflow, the request is rejected
|
|
|
|
**When to use it:**
|
|
|
|
- You need consistent, smooth throughput
|
|
- Downstream systems can't handle bursts
|
|
- Processing capacity is truly fixed (e.g., hardware limitations)
|
|
|
|
**Difference from token bucket:**
|
|
|
|
- Token bucket allows bursts up to the bucket size
|
|
- Leaky bucket smooths out traffic to a constant rate
|
|
|
|
Sliding Window Counter
|
|
----------------------
|
|
|
|
This is the default algorithm, and it's a good choice for most use cases. It
|
|
combines the efficiency of fixed windows with better accuracy.
|
|
|
|
.. code-block:: python
|
|
|
|
@app.get("/api/default")
|
|
@rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW_COUNTER)
|
|
async def default_endpoint(request: Request):
|
|
return {"status": "ok"}
|
|
|
|
**How it works:**
|
|
|
|
1. Maintains counters for the current and previous windows
|
|
2. Calculates a weighted average based on how far into the current window we are
|
|
3. At 30 seconds into a 60-second window: ``count = prev_count * 0.5 + curr_count``
|
|
|
|
**When to use it:**
|
|
|
|
- General purpose rate limiting
|
|
- You want better accuracy than fixed window without the memory cost of sliding window
|
|
- Most APIs fall into this category
|
|
|
|
**Why it's the default:**
|
|
|
|
It gives you 90% of the accuracy of sliding window with the memory efficiency of
|
|
fixed window. Unless you have specific requirements, this is probably what you want.
|
|
|
|
Choosing the Right Algorithm
|
|
----------------------------
|
|
|
|
Here's a decision tree:
|
|
|
|
1. **Do you need to allow bursts?**
|
|
|
|
- Yes → Token Bucket
|
|
- No, I need smooth traffic → Leaky Bucket
|
|
|
|
2. **Do you need exact precision?**
|
|
|
|
- Yes, compliance/financial → Sliding Window
|
|
- No, good enough is fine → Continue
|
|
|
|
3. **Is memory a concern?**
|
|
|
|
- Yes, high volume → Fixed Window
|
|
- No → Sliding Window Counter (default)
|
|
|
|
Performance Comparison
|
|
----------------------
|
|
|
|
All algorithms are O(1) for the check operation, but they differ in storage:
|
|
|
|
.. list-table::
|
|
:header-rows: 1
|
|
|
|
* - Algorithm
|
|
- Storage per Key
|
|
- Operations
|
|
* - Token Bucket
|
|
- 2 floats
|
|
- 1 read, 1 write
|
|
* - Sliding Window
|
|
- N timestamps
|
|
- 1 read, 1 write, cleanup
|
|
* - Fixed Window
|
|
- 1 int, 1 float
|
|
- 1 read, 1 write
|
|
* - Leaky Bucket
|
|
- 2 floats
|
|
- 1 read, 1 write
|
|
* - Sliding Window Counter
|
|
- 3 values
|
|
- 1 read, 1 write
|
|
|
|
For most applications, the performance difference is negligible. Choose based on
|
|
behavior, not performance, unless you're handling millions of requests per second.
|
|
|
|
Code Examples
|
|
-------------
|
|
|
|
Here's a complete example showing all algorithms:
|
|
|
|
.. code-block:: python
|
|
|
|
from fastapi import FastAPI, Request
|
|
from fastapi_traffic import rate_limit, Algorithm
|
|
|
|
app = FastAPI()
|
|
|
|
# Burst-friendly endpoint
|
|
@app.get("/api/burst")
|
|
@rate_limit(100, 60, algorithm=Algorithm.TOKEN_BUCKET, burst_size=25)
|
|
async def burst_endpoint(request: Request):
|
|
return {"type": "token_bucket"}
|
|
|
|
# Precise limiting
|
|
@app.get("/api/precise")
|
|
@rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW)
|
|
async def precise_endpoint(request: Request):
|
|
return {"type": "sliding_window"}
|
|
|
|
# Simple and efficient
|
|
@app.get("/api/simple")
|
|
@rate_limit(100, 60, algorithm=Algorithm.FIXED_WINDOW)
|
|
async def simple_endpoint(request: Request):
|
|
return {"type": "fixed_window"}
|
|
|
|
# Smooth throughput
|
|
@app.get("/api/steady")
|
|
@rate_limit(100, 60, algorithm=Algorithm.LEAKY_BUCKET)
|
|
async def steady_endpoint(request: Request):
|
|
return {"type": "leaky_bucket"}
|
|
|
|
# Best of both worlds (default)
|
|
@app.get("/api/balanced")
|
|
@rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW_COUNTER)
|
|
async def balanced_endpoint(request: Request):
|
|
return {"type": "sliding_window_counter"}
|