release: bump version to 0.3.0

- Refactor Redis backend connection handling and pool management
- Update algorithm implementations with improved type annotations
- Enhance config loader validation with stricter Pydantic schemas
- Improve decorator and middleware error handling
- Expand example scripts with better docstrings and usage patterns
- Add new 00_basic_usage.py example for quick start
- Reorganize examples directory structure
- Fix type annotation inconsistencies across core modules
- Update dependencies in pyproject.toml
This commit is contained in:
2026-03-17 20:55:38 +00:00
parent 492410614f
commit f3453cb0fc
51 changed files with 6507 additions and 166 deletions

View File

@@ -0,0 +1,290 @@
Rate Limiting Algorithms
========================
FastAPI Traffic ships with five rate limiting algorithms. Each has its own strengths,
and picking the right one depends on what you're trying to achieve.
This guide will help you understand the tradeoffs and choose wisely.
Overview
--------
Here's the quick comparison:
.. list-table::
:header-rows: 1
:widths: 20 40 40
* - Algorithm
- Best For
- Tradeoffs
* - **Token Bucket**
- APIs that need burst handling
- Allows temporary spikes above average rate
* - **Sliding Window**
- Precise rate limiting
- Higher memory usage
* - **Fixed Window**
- Simple, low-overhead limiting
- Boundary issues (2x burst at window edges)
* - **Leaky Bucket**
- Consistent throughput
- No burst handling
* - **Sliding Window Counter**
- General purpose (default)
- Good balance of precision and efficiency
Token Bucket
------------
Think of this as a bucket that holds tokens. Each request consumes a token, and
tokens refill at a steady rate. If the bucket is empty, requests are rejected.
.. code-block:: python
from fastapi_traffic import rate_limit, Algorithm
@app.get("/api/data")
@rate_limit(
100, # 100 tokens refill per minute
60,
algorithm=Algorithm.TOKEN_BUCKET,
burst_size=20, # bucket can hold up to 20 tokens
)
async def get_data(request: Request):
return {"data": "here"}
**How it works:**
1. The bucket starts full (at ``burst_size`` capacity)
2. Each request removes one token
3. Tokens refill at ``limit / window_size`` per second
4. If no tokens are available, the request is rejected
**When to use it:**
- Your API has legitimate burst traffic (e.g., page loads that trigger multiple requests)
- You want to allow short spikes while maintaining an average rate
- Mobile apps that batch requests when coming online
**Example scenario:** A mobile app that syncs data when it reconnects. You want to
allow it to catch up quickly, but not overwhelm your servers.
Sliding Window
--------------
This algorithm tracks the exact timestamp of every request within the window. It's
the most accurate approach, but uses more memory.
.. code-block:: python
@app.get("/api/transactions")
@rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW)
async def get_transactions(request: Request):
return {"transactions": []}
**How it works:**
1. Every request timestamp is stored
2. When checking, we count requests in the last ``window_size`` seconds
3. Old timestamps are cleaned up automatically
**When to use it:**
- You need precise rate limiting (financial APIs, compliance requirements)
- Memory isn't a major concern
- The rate limit is relatively low (not millions of requests)
**Tradeoffs:**
- Memory usage grows with request volume
- Slightly more CPU for timestamp management
Fixed Window
------------
The simplest algorithm. Divide time into fixed windows (e.g., every minute) and
count requests in each window.
.. code-block:: python
@app.get("/api/simple")
@rate_limit(100, 60, algorithm=Algorithm.FIXED_WINDOW)
async def simple_endpoint(request: Request):
return {"status": "ok"}
**How it works:**
1. Time is divided into fixed windows (0:00-1:00, 1:00-2:00, etc.)
2. Each request increments the counter for the current window
3. When the window changes, the counter resets
**When to use it:**
- You want the simplest, most efficient option
- Slight inaccuracy at window boundaries is acceptable
- High-volume scenarios where memory matters
**The boundary problem:**
A client could make 100 requests at 0:59 and another 100 at 1:01, effectively
getting 200 requests in 2 seconds. If this matters for your use case, use
sliding window counter instead.
Leaky Bucket
------------
Imagine a bucket with a hole in the bottom. Requests fill the bucket, and it
"leaks" at a constant rate. If the bucket overflows, requests are rejected.
.. code-block:: python
@app.get("/api/steady")
@rate_limit(
100,
60,
algorithm=Algorithm.LEAKY_BUCKET,
burst_size=10, # bucket capacity
)
async def steady_endpoint(request: Request):
return {"status": "ok"}
**How it works:**
1. The bucket has a maximum capacity (``burst_size``)
2. Each request adds "water" to the bucket
3. Water leaks out at ``limit / window_size`` per second
4. If the bucket would overflow, the request is rejected
**When to use it:**
- You need consistent, smooth throughput
- Downstream systems can't handle bursts
- Processing capacity is truly fixed (e.g., hardware limitations)
**Difference from token bucket:**
- Token bucket allows bursts up to the bucket size
- Leaky bucket smooths out traffic to a constant rate
Sliding Window Counter
----------------------
This is the default algorithm, and it's a good choice for most use cases. It
combines the efficiency of fixed windows with better accuracy.
.. code-block:: python
@app.get("/api/default")
@rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW_COUNTER)
async def default_endpoint(request: Request):
return {"status": "ok"}
**How it works:**
1. Maintains counters for the current and previous windows
2. Calculates a weighted average based on how far into the current window we are
3. At 30 seconds into a 60-second window: ``count = prev_count * 0.5 + curr_count``
**When to use it:**
- General purpose rate limiting
- You want better accuracy than fixed window without the memory cost of sliding window
- Most APIs fall into this category
**Why it's the default:**
It gives you 90% of the accuracy of sliding window with the memory efficiency of
fixed window. Unless you have specific requirements, this is probably what you want.
Choosing the Right Algorithm
----------------------------
Here's a decision tree:
1. **Do you need to allow bursts?**
- Yes → Token Bucket
- No, I need smooth traffic → Leaky Bucket
2. **Do you need exact precision?**
- Yes, compliance/financial → Sliding Window
- No, good enough is fine → Continue
3. **Is memory a concern?**
- Yes, high volume → Fixed Window
- No → Sliding Window Counter (default)
Performance Comparison
----------------------
All algorithms are O(1) for the check operation, but they differ in storage:
.. list-table::
:header-rows: 1
* - Algorithm
- Storage per Key
- Operations
* - Token Bucket
- 2 floats
- 1 read, 1 write
* - Sliding Window
- N timestamps
- 1 read, 1 write, cleanup
* - Fixed Window
- 1 int, 1 float
- 1 read, 1 write
* - Leaky Bucket
- 2 floats
- 1 read, 1 write
* - Sliding Window Counter
- 3 values
- 1 read, 1 write
For most applications, the performance difference is negligible. Choose based on
behavior, not performance, unless you're handling millions of requests per second.
Code Examples
-------------
Here's a complete example showing all algorithms:
.. code-block:: python
from fastapi import FastAPI, Request
from fastapi_traffic import rate_limit, Algorithm
app = FastAPI()
# Burst-friendly endpoint
@app.get("/api/burst")
@rate_limit(100, 60, algorithm=Algorithm.TOKEN_BUCKET, burst_size=25)
async def burst_endpoint(request: Request):
return {"type": "token_bucket"}
# Precise limiting
@app.get("/api/precise")
@rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW)
async def precise_endpoint(request: Request):
return {"type": "sliding_window"}
# Simple and efficient
@app.get("/api/simple")
@rate_limit(100, 60, algorithm=Algorithm.FIXED_WINDOW)
async def simple_endpoint(request: Request):
return {"type": "fixed_window"}
# Smooth throughput
@app.get("/api/steady")
@rate_limit(100, 60, algorithm=Algorithm.LEAKY_BUCKET)
async def steady_endpoint(request: Request):
return {"type": "leaky_bucket"}
# Best of both worlds (default)
@app.get("/api/balanced")
@rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW_COUNTER)
async def balanced_endpoint(request: Request):
return {"type": "sliding_window_counter"}

View File

@@ -0,0 +1,312 @@
Storage Backends
================
FastAPI Traffic needs somewhere to store rate limit state — how many requests each
client has made, when their window resets, and so on. That's what backends are for.
You have three options, each suited to different deployment scenarios.
Choosing a Backend
------------------
Here's the quick guide:
.. list-table::
:header-rows: 1
:widths: 20 30 50
* - Backend
- Use When
- Limitations
* - **Memory**
- Development, single-process apps
- Lost on restart, doesn't share across processes
* - **SQLite**
- Single-node production
- Doesn't share across machines
* - **Redis**
- Distributed systems, multiple nodes
- Requires Redis infrastructure
Memory Backend
--------------
The default backend. It stores everything in memory using a dictionary with LRU
eviction and automatic TTL cleanup.
.. code-block:: python
from fastapi_traffic import MemoryBackend, RateLimiter
from fastapi_traffic.core.limiter import set_limiter
# This is what happens by default, but you can configure it:
backend = MemoryBackend(
max_size=10000, # Maximum number of keys to store
cleanup_interval=60, # How often to clean expired entries (seconds)
)
limiter = RateLimiter(backend)
set_limiter(limiter)
**When to use it:**
- Local development
- Single-process applications
- Testing and CI/CD pipelines
- When you don't need persistence
**Limitations:**
- State is lost when the process restarts
- Doesn't work with multiple workers (each worker has its own memory)
- Not suitable for ``gunicorn`` with multiple workers or Kubernetes pods
**Memory management:**
The backend automatically evicts old entries when it hits ``max_size``. It uses
LRU (Least Recently Used) eviction, so inactive clients get cleaned up first.
SQLite Backend
--------------
For single-node production deployments where you need persistence. Rate limits
survive restarts and work across multiple processes on the same machine.
.. code-block:: python
from fastapi_traffic import SQLiteBackend, RateLimiter
from fastapi_traffic.core.limiter import set_limiter
backend = SQLiteBackend(
"rate_limits.db", # Database file path
cleanup_interval=300, # Clean expired entries every 5 minutes
)
limiter = RateLimiter(backend)
set_limiter(limiter)
@app.on_event("startup")
async def startup():
await limiter.initialize()
@app.on_event("shutdown")
async def shutdown():
await limiter.close()
**When to use it:**
- Single-server deployments
- When you need rate limits to survive restarts
- Multiple workers on the same machine (gunicorn, uvicorn with workers)
- When Redis is overkill for your use case
**Performance notes:**
- Uses WAL (Write-Ahead Logging) mode for better concurrent performance
- Connection pooling is handled automatically
- Writes are batched where possible
**File location:**
Put the database file somewhere persistent. For Docker deployments, mount a volume:
.. code-block:: yaml
# docker-compose.yml
services:
api:
volumes:
- ./data:/app/data
environment:
- RATE_LIMIT_DB=/app/data/rate_limits.db
Redis Backend
-------------
The go-to choice for distributed systems. All your application instances share
the same rate limit state.
.. code-block:: python
from fastapi_traffic import RateLimiter
from fastapi_traffic.backends.redis import RedisBackend
from fastapi_traffic.core.limiter import set_limiter
@app.on_event("startup")
async def startup():
backend = await RedisBackend.from_url(
"redis://localhost:6379/0",
key_prefix="myapp:ratelimit", # Optional prefix for all keys
)
limiter = RateLimiter(backend)
set_limiter(limiter)
await limiter.initialize()
@app.on_event("shutdown")
async def shutdown():
await limiter.close()
**When to use it:**
- Multiple application instances (Kubernetes, load-balanced servers)
- When you need rate limits shared across your entire infrastructure
- High-availability requirements
**Connection options:**
.. code-block:: python
# Simple connection
backend = await RedisBackend.from_url("redis://localhost:6379/0")
# With authentication
backend = await RedisBackend.from_url("redis://:password@localhost:6379/0")
# Redis Sentinel for HA
backend = await RedisBackend.from_url(
"redis://sentinel1:26379/0",
sentinel_master="mymaster",
)
# Redis Cluster
backend = await RedisBackend.from_url("redis://node1:6379,node2:6379,node3:6379/0")
**Atomic operations:**
The Redis backend uses Lua scripts to ensure atomic operations. This means rate
limit checks are accurate even under high concurrency — no race conditions.
**Key expiration:**
Keys automatically expire based on the rate limit window. You don't need to worry
about Redis filling up with stale data.
Switching Backends
------------------
You can switch backends without changing your rate limiting code. Just configure
a different backend at startup:
.. code-block:: python
import os
from fastapi_traffic import RateLimiter, MemoryBackend, SQLiteBackend
from fastapi_traffic.core.limiter import set_limiter
def get_backend():
"""Choose backend based on environment."""
env = os.getenv("ENVIRONMENT", "development")
if env == "production":
redis_url = os.getenv("REDIS_URL")
if redis_url:
from fastapi_traffic.backends.redis import RedisBackend
return RedisBackend.from_url(redis_url)
return SQLiteBackend("/app/data/rate_limits.db")
return MemoryBackend()
@app.on_event("startup")
async def startup():
backend = await get_backend()
limiter = RateLimiter(backend)
set_limiter(limiter)
await limiter.initialize()
Custom Backends
---------------
Need something different? Maybe you want to use PostgreSQL, DynamoDB, or some
other storage system. You can implement your own backend:
.. code-block:: python
from fastapi_traffic.backends.base import Backend
from typing import Any
class MyCustomBackend(Backend):
async def get(self, key: str) -> dict[str, Any] | None:
"""Retrieve state for a key."""
# Your implementation here
pass
async def set(self, key: str, value: dict[str, Any], *, ttl: float) -> None:
"""Store state with TTL."""
pass
async def delete(self, key: str) -> None:
"""Delete a key."""
pass
async def exists(self, key: str) -> bool:
"""Check if key exists."""
pass
async def increment(self, key: str, amount: int = 1) -> int:
"""Atomically increment a counter."""
pass
async def clear(self) -> None:
"""Clear all data."""
pass
async def close(self) -> None:
"""Clean up resources."""
pass
The key methods are ``get``, ``set``, and ``delete``. The state is stored as a
dictionary, and the backend is responsible for serialization.
Backend Comparison
------------------
.. list-table::
:header-rows: 1
* - Feature
- Memory
- SQLite
- Redis
* - Persistence
- ❌
- ✅
- ✅
* - Multi-process
- ❌
- ✅
- ✅
* - Multi-node
- ❌
- ❌
- ✅
* - Setup complexity
- None
- Low
- Medium
* - Latency
- ~0.01ms
- ~0.1ms
- ~1ms
* - Dependencies
- None
- None
- redis package
Best Practices
--------------
1. **Start with Memory, upgrade when needed.** Don't over-engineer. Memory is
fine for development and many production scenarios.
2. **Use Redis for distributed systems.** If you have multiple application
instances, Redis is the only option that works correctly.
3. **Handle backend errors gracefully.** Set ``skip_on_error=True`` if you'd
rather allow requests through than fail when the backend is down:
.. code-block:: python
@rate_limit(100, 60, skip_on_error=True)
async def endpoint(request: Request):
return {"status": "ok"}
4. **Monitor your backend.** Keep an eye on memory usage (Memory backend),
disk space (SQLite), or Redis memory and connections.

View File

@@ -0,0 +1,315 @@
Configuration
=============
FastAPI Traffic supports loading configuration from environment variables and files.
This makes it easy to manage settings across different environments without changing code.
Configuration Loader
--------------------
The ``ConfigLoader`` class handles loading configuration from various sources:
.. code-block:: python
from fastapi_traffic import ConfigLoader, RateLimitConfig
loader = ConfigLoader()
# Load from environment variables
config = loader.load_rate_limit_config_from_env()
# Load from a JSON file
config = loader.load_rate_limit_config_from_json("config/rate_limits.json")
# Load from a .env file
config = loader.load_rate_limit_config_from_env_file(".env")
Environment Variables
---------------------
Set rate limit configuration using environment variables with the ``FASTAPI_TRAFFIC_``
prefix:
.. code-block:: bash
# Basic settings
export FASTAPI_TRAFFIC_RATE_LIMIT_LIMIT=100
export FASTAPI_TRAFFIC_RATE_LIMIT_WINDOW_SIZE=60
export FASTAPI_TRAFFIC_RATE_LIMIT_ALGORITHM=sliding_window_counter
# Optional settings
export FASTAPI_TRAFFIC_RATE_LIMIT_KEY_PREFIX=myapp
export FASTAPI_TRAFFIC_RATE_LIMIT_BURST_SIZE=20
export FASTAPI_TRAFFIC_RATE_LIMIT_INCLUDE_HEADERS=true
export FASTAPI_TRAFFIC_RATE_LIMIT_ERROR_MESSAGE="Too many requests"
export FASTAPI_TRAFFIC_RATE_LIMIT_STATUS_CODE=429
export FASTAPI_TRAFFIC_RATE_LIMIT_SKIP_ON_ERROR=false
export FASTAPI_TRAFFIC_RATE_LIMIT_COST=1
Then load them in your app:
.. code-block:: python
from fastapi_traffic import load_rate_limit_config_from_env, rate_limit
# Load config from environment
config = load_rate_limit_config_from_env()
# Use it with the decorator
@app.get("/api/data")
@rate_limit(config.limit, config.window_size, algorithm=config.algorithm)
async def get_data(request: Request):
return {"data": "here"}
Custom Prefix
-------------
If ``FASTAPI_TRAFFIC_`` conflicts with something else, use a custom prefix:
.. code-block:: python
loader = ConfigLoader(prefix="MYAPP_RATELIMIT")
config = loader.load_rate_limit_config_from_env()
# Now reads from:
# MYAPP_RATELIMIT_RATE_LIMIT_LIMIT=100
# MYAPP_RATELIMIT_RATE_LIMIT_WINDOW_SIZE=60
# etc.
JSON Configuration
------------------
For more complex setups, use a JSON file:
.. code-block:: json
{
"limit": 100,
"window_size": 60,
"algorithm": "token_bucket",
"burst_size": 25,
"key_prefix": "api",
"include_headers": true,
"error_message": "Rate limit exceeded. Please slow down.",
"status_code": 429,
"skip_on_error": false,
"cost": 1
}
Load it:
.. code-block:: python
from fastapi_traffic import ConfigLoader
loader = ConfigLoader()
config = loader.load_rate_limit_config_from_json("config/rate_limits.json")
.env Files
----------
You can also use ``.env`` files, which is handy for local development:
.. code-block:: bash
# .env
FASTAPI_TRAFFIC_RATE_LIMIT_LIMIT=100
FASTAPI_TRAFFIC_RATE_LIMIT_WINDOW_SIZE=60
FASTAPI_TRAFFIC_RATE_LIMIT_ALGORITHM=sliding_window
Load it:
.. code-block:: python
loader = ConfigLoader()
config = loader.load_rate_limit_config_from_env_file(".env")
Global Configuration
--------------------
Besides per-endpoint configuration, you can set global defaults:
.. code-block:: bash
# Global settings
export FASTAPI_TRAFFIC_GLOBAL_ENABLED=true
export FASTAPI_TRAFFIC_GLOBAL_DEFAULT_LIMIT=100
export FASTAPI_TRAFFIC_GLOBAL_DEFAULT_WINDOW_SIZE=60
export FASTAPI_TRAFFIC_GLOBAL_DEFAULT_ALGORITHM=sliding_window_counter
export FASTAPI_TRAFFIC_GLOBAL_KEY_PREFIX=fastapi_traffic
export FASTAPI_TRAFFIC_GLOBAL_INCLUDE_HEADERS=true
export FASTAPI_TRAFFIC_GLOBAL_ERROR_MESSAGE="Rate limit exceeded"
export FASTAPI_TRAFFIC_GLOBAL_STATUS_CODE=429
export FASTAPI_TRAFFIC_GLOBAL_SKIP_ON_ERROR=false
export FASTAPI_TRAFFIC_GLOBAL_HEADERS_PREFIX=X-RateLimit
Load global config:
.. code-block:: python
from fastapi_traffic import load_global_config_from_env, RateLimiter
from fastapi_traffic.core.limiter import set_limiter
global_config = load_global_config_from_env()
limiter = RateLimiter(config=global_config)
set_limiter(limiter)
Auto-Detection
--------------
The convenience functions automatically detect file format:
.. code-block:: python
from fastapi_traffic import load_rate_limit_config, load_global_config
# Detects JSON by extension
config = load_rate_limit_config("config/limits.json")
# Detects .env file
config = load_rate_limit_config("config/.env")
# Works for global config too
global_config = load_global_config("config/global.json")
Overriding Values
-----------------
You can override loaded values programmatically:
.. code-block:: python
loader = ConfigLoader()
# Load base config from file
config = loader.load_rate_limit_config_from_json(
"config/base.json",
limit=200, # Override the limit
key_prefix="custom", # Override the prefix
)
This is useful for environment-specific overrides:
.. code-block:: python
import os
base_config = loader.load_rate_limit_config_from_json("config/base.json")
# Apply environment-specific overrides
if os.getenv("ENVIRONMENT") == "production":
config = loader.load_rate_limit_config_from_json(
"config/base.json",
limit=base_config.limit * 2, # Double the limit in production
)
Validation
----------
Configuration is validated when loaded. Invalid values raise ``ConfigurationError``:
.. code-block:: python
from fastapi_traffic import ConfigLoader, ConfigurationError
loader = ConfigLoader()
try:
config = loader.load_rate_limit_config_from_env()
except ConfigurationError as e:
print(f"Invalid configuration: {e}")
# Handle the error appropriately
Common validation errors:
- ``limit`` must be a positive integer
- ``window_size`` must be a positive number
- ``algorithm`` must be one of the valid algorithm names
- ``status_code`` must be a valid HTTP status code
Algorithm Names
---------------
When specifying algorithms in configuration, use these names:
.. list-table::
:header-rows: 1
* - Config Value
- Algorithm
* - ``token_bucket``
- Token Bucket
* - ``sliding_window``
- Sliding Window
* - ``fixed_window``
- Fixed Window
* - ``leaky_bucket``
- Leaky Bucket
* - ``sliding_window_counter``
- Sliding Window Counter (default)
Boolean Values
--------------
Boolean settings accept various formats:
- **True:** ``true``, ``1``, ``yes``, ``on``
- **False:** ``false``, ``0``, ``no``, ``off``
Case doesn't matter.
Complete Example
----------------
Here's a full example showing configuration loading in a real app:
.. code-block:: python
import os
from fastapi import FastAPI, Request
from fastapi_traffic import (
ConfigLoader,
ConfigurationError,
RateLimiter,
rate_limit,
)
from fastapi_traffic.core.limiter import set_limiter
app = FastAPI()
@app.on_event("startup")
async def startup():
loader = ConfigLoader()
try:
# Try to load from environment first
global_config = loader.load_global_config_from_env()
except ConfigurationError:
# Fall back to defaults
global_config = None
limiter = RateLimiter(config=global_config)
set_limiter(limiter)
await limiter.initialize()
@app.get("/api/data")
@rate_limit(100, 60)
async def get_data(request: Request):
return {"data": "here"}
# Or load endpoint-specific config
loader = ConfigLoader()
try:
api_config = loader.load_rate_limit_config_from_json("config/api_limits.json")
except (FileNotFoundError, ConfigurationError):
api_config = None
if api_config:
@app.get("/api/special")
@rate_limit(
api_config.limit,
api_config.window_size,
algorithm=api_config.algorithm,
)
async def special_endpoint(request: Request):
return {"special": "data"}

View File

@@ -0,0 +1,277 @@
Exception Handling
==================
When a client exceeds their rate limit, FastAPI Traffic raises a ``RateLimitExceeded``
exception. This guide covers how to handle it gracefully.
Default Behavior
----------------
By default, when a rate limit is exceeded, the library raises ``RateLimitExceeded``.
FastAPI will convert this to a 500 error unless you handle it.
The exception contains useful information:
.. code-block:: python
from fastapi_traffic import RateLimitExceeded
try:
# Rate limited operation
pass
except RateLimitExceeded as exc:
print(exc.message) # "Rate limit exceeded"
print(exc.retry_after) # Seconds until they can retry (e.g., 45.2)
print(exc.limit_info) # RateLimitInfo object with full details
Custom Exception Handler
------------------------
The most common approach is to register a custom exception handler:
.. code-block:: python
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
from fastapi_traffic import RateLimitExceeded
app = FastAPI()
@app.exception_handler(RateLimitExceeded)
async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
return JSONResponse(
status_code=429,
content={
"error": "rate_limit_exceeded",
"message": "You're making too many requests. Please slow down.",
"retry_after": exc.retry_after,
},
headers={
"Retry-After": str(int(exc.retry_after or 60)),
},
)
Now clients get a clean JSON response instead of a generic error.
Including Rate Limit Headers
----------------------------
The ``limit_info`` object can generate standard rate limit headers:
.. code-block:: python
@app.exception_handler(RateLimitExceeded)
async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
headers = {}
if exc.limit_info:
headers = exc.limit_info.to_headers()
return JSONResponse(
status_code=429,
content={
"error": "rate_limit_exceeded",
"retry_after": exc.retry_after,
},
headers=headers,
)
This adds headers like:
.. code-block:: text
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1709834400
Retry-After: 45
Different Responses for Different Endpoints
-------------------------------------------
You might want different error messages for different parts of your API:
.. code-block:: python
@app.exception_handler(RateLimitExceeded)
async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
path = request.url.path
if path.startswith("/api/v1/"):
# API clients get JSON
return JSONResponse(
status_code=429,
content={"error": "rate_limit_exceeded", "retry_after": exc.retry_after},
)
elif path.startswith("/web/"):
# Web users get a friendly HTML page
return HTMLResponse(
status_code=429,
content="<h1>Slow down!</h1><p>Please wait a moment before trying again.</p>",
)
else:
# Default response
return JSONResponse(
status_code=429,
content={"detail": exc.message},
)
Using the on_blocked Callback
-----------------------------
Instead of (or in addition to) exception handling, you can use the ``on_blocked``
callback to run code when a request is blocked:
.. code-block:: python
import logging
logger = logging.getLogger(__name__)
def log_blocked_request(request: Request, result):
"""Log when a request is rate limited."""
client_ip = request.client.host if request.client else "unknown"
logger.warning(
"Rate limit exceeded for %s on %s %s",
client_ip,
request.method,
request.url.path,
)
@app.get("/api/data")
@rate_limit(100, 60, on_blocked=log_blocked_request)
async def get_data(request: Request):
return {"data": "here"}
The callback receives the request and the rate limit result. It runs before the
exception is raised.
Exempting Certain Requests
--------------------------
Use ``exempt_when`` to skip rate limiting for certain requests:
.. code-block:: python
def is_admin(request: Request) -> bool:
"""Check if request is from an admin."""
user = getattr(request.state, "user", None)
return user is not None and user.is_admin
@app.get("/api/data")
@rate_limit(100, 60, exempt_when=is_admin)
async def get_data(request: Request):
return {"data": "here"}
Admin requests bypass rate limiting entirely.
Graceful Degradation
--------------------
Sometimes you'd rather serve a degraded response than reject the request entirely:
.. code-block:: python
from fastapi_traffic import RateLimiter, RateLimitConfig
from fastapi_traffic.core.limiter import get_limiter
@app.get("/api/search")
async def search(request: Request, q: str):
limiter = get_limiter()
config = RateLimitConfig(limit=100, window_size=60)
result = await limiter.check(request, config)
if not result.allowed:
# Return cached/simplified results instead of blocking
return {
"results": get_cached_results(q),
"note": "Results may be stale. Please try again later.",
"retry_after": result.info.retry_after,
}
# Full search
return {"results": perform_full_search(q)}
Backend Errors
--------------
If the rate limit backend fails (Redis down, SQLite locked, etc.), you have options:
**Option 1: Fail closed (default)**
Requests fail when the backend is unavailable. Safer, but impacts availability.
**Option 2: Fail open**
Allow requests through when the backend fails:
.. code-block:: python
@app.get("/api/data")
@rate_limit(100, 60, skip_on_error=True)
async def get_data(request: Request):
return {"data": "here"}
**Option 3: Handle the error explicitly**
.. code-block:: python
from fastapi_traffic import BackendError
@app.exception_handler(BackendError)
async def backend_error_handler(request: Request, exc: BackendError):
# Log the error
logger.error("Rate limit backend error: %s", exc.original_error)
# Decide what to do
# Option A: Allow the request
return None # Let the request continue
# Option B: Return an error
return JSONResponse(
status_code=503,
content={"error": "service_unavailable"},
)
Other Exceptions
----------------
FastAPI Traffic defines a few exception types:
.. code-block:: python
from fastapi_traffic import (
RateLimitExceeded, # Rate limit was exceeded
BackendError, # Storage backend failed
ConfigurationError, # Invalid configuration
)
All inherit from ``FastAPITrafficError``:
.. code-block:: python
from fastapi_traffic.exceptions import FastAPITrafficError
@app.exception_handler(FastAPITrafficError)
async def traffic_error_handler(request: Request, exc: FastAPITrafficError):
"""Catch-all for FastAPI Traffic errors."""
if isinstance(exc, RateLimitExceeded):
return JSONResponse(status_code=429, content={"error": "rate_limited"})
elif isinstance(exc, BackendError):
return JSONResponse(status_code=503, content={"error": "backend_error"})
else:
return JSONResponse(status_code=500, content={"error": "internal_error"})
Helper Function
---------------
FastAPI Traffic provides a helper to create rate limit responses:
.. code-block:: python
from fastapi_traffic.core.decorator import create_rate_limit_response
@app.exception_handler(RateLimitExceeded)
async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
return create_rate_limit_response(exc, include_headers=True)
This creates a standard 429 response with all the appropriate headers.

View File

@@ -0,0 +1,258 @@
Key Extractors
==============
A key extractor is a function that identifies who's making a request. By default,
FastAPI Traffic uses the client's IP address, but you can customize this to fit
your authentication model.
How It Works
------------
Every rate limit needs a way to group requests. The key extractor returns a string
that identifies the client:
.. code-block:: python
def my_key_extractor(request: Request) -> str:
return "some-unique-identifier"
All requests that return the same identifier share the same rate limit bucket.
Default Behavior
----------------
The default extractor looks for the client IP in this order:
1. ``X-Forwarded-For`` header (first IP in the list)
2. ``X-Real-IP`` header
3. Direct connection IP (``request.client.host``)
4. Falls back to ``"unknown"``
This handles most reverse proxy setups automatically.
Rate Limiting by API Key
------------------------
For authenticated APIs, you probably want to limit by API key:
.. code-block:: python
from fastapi import Request
from fastapi_traffic import rate_limit
def api_key_extractor(request: Request) -> str:
"""Rate limit by API key."""
api_key = request.headers.get("X-API-Key")
if api_key:
return f"apikey:{api_key}"
# Fall back to IP for unauthenticated requests
return f"ip:{request.client.host}" if request.client else "ip:unknown"
@app.get("/api/data")
@rate_limit(1000, 3600, key_extractor=api_key_extractor)
async def get_data(request: Request):
return {"data": "here"}
Now each API key gets its own rate limit bucket.
Rate Limiting by User
---------------------
If you're using authentication middleware that sets the user:
.. code-block:: python
def user_extractor(request: Request) -> str:
"""Rate limit by authenticated user."""
# Assuming your auth middleware sets request.state.user
user = getattr(request.state, "user", None)
if user:
return f"user:{user.id}"
return f"ip:{request.client.host}" if request.client else "ip:unknown"
@app.get("/api/profile")
@rate_limit(100, 60, key_extractor=user_extractor)
async def get_profile(request: Request):
return {"profile": "data"}
Rate Limiting by Tenant
-----------------------
For multi-tenant applications:
.. code-block:: python
def tenant_extractor(request: Request) -> str:
"""Rate limit by tenant."""
# From subdomain
host = request.headers.get("host", "")
if "." in host:
tenant = host.split(".")[0]
return f"tenant:{tenant}"
# Or from header
tenant = request.headers.get("X-Tenant-ID")
if tenant:
return f"tenant:{tenant}"
return "tenant:default"
Combining Identifiers
---------------------
Sometimes you want to combine multiple factors:
.. code-block:: python
def combined_extractor(request: Request) -> str:
"""Rate limit by user AND endpoint."""
user = getattr(request.state, "user", None)
user_id = user.id if user else "anonymous"
endpoint = request.url.path
return f"{user_id}:{endpoint}"
This gives each user a separate limit for each endpoint.
Tiered Rate Limits
------------------
Different users might have different limits. Handle this with a custom extractor
that includes the tier:
.. code-block:: python
def tiered_extractor(request: Request) -> str:
"""Include tier in the key for different limits."""
user = getattr(request.state, "user", None)
if user:
# Premium users get a different bucket
tier = "premium" if user.is_premium else "free"
return f"{tier}:{user.id}"
return f"anonymous:{request.client.host}"
Then apply different limits based on tier:
.. code-block:: python
# You'd typically do this with middleware or dependency injection
# to check the tier and apply the appropriate limit
@app.get("/api/data")
async def get_data(request: Request):
user = getattr(request.state, "user", None)
if user and user.is_premium:
# Premium: 10000 req/hour
limit, window = 10000, 3600
else:
# Free: 100 req/hour
limit, window = 100, 3600
# Apply rate limit manually
limiter = get_limiter()
config = RateLimitConfig(limit=limit, window_size=window)
await limiter.hit(request, config)
return {"data": "here"}
Geographic Rate Limiting
------------------------
Limit by country or region:
.. code-block:: python
def geo_extractor(request: Request) -> str:
"""Rate limit by country."""
# Assuming you have a GeoIP lookup
country = request.headers.get("CF-IPCountry", "XX") # Cloudflare header
ip = request.client.host if request.client else "unknown"
return f"{country}:{ip}"
This lets you apply different limits to different regions if needed.
Endpoint-Specific Keys
----------------------
Rate limit the same user differently per endpoint:
.. code-block:: python
def endpoint_user_extractor(request: Request) -> str:
"""Separate limits per endpoint per user."""
user = getattr(request.state, "user", None)
user_id = user.id if user else request.client.host
method = request.method
path = request.url.path
return f"{user_id}:{method}:{path}"
Best Practices
--------------
1. **Always have a fallback.** If your primary identifier isn't available, fall
back to IP:
.. code-block:: python
def safe_extractor(request: Request) -> str:
api_key = request.headers.get("X-API-Key")
if api_key:
return f"key:{api_key}"
return f"ip:{request.client.host if request.client else 'unknown'}"
2. **Use prefixes.** When mixing identifier types, prefix them to avoid collisions:
.. code-block:: python
# Good - clear what each key represents
return f"user:{user_id}"
return f"ip:{ip_address}"
return f"key:{api_key}"
# Bad - could collide
return user_id
return ip_address
3. **Keep it fast.** The extractor runs on every request. Avoid database lookups
or expensive operations:
.. code-block:: python
# Bad - database lookup on every request
def slow_extractor(request: Request) -> str:
user = db.get_user(request.headers.get("Authorization"))
return user.id
# Good - use data already in the request
def fast_extractor(request: Request) -> str:
return request.state.user.id # Set by auth middleware
4. **Be consistent.** The same client should always get the same key. Watch out
for things like:
- IP addresses changing (mobile users)
- Case sensitivity (normalize to lowercase)
- Whitespace (strip it)
.. code-block:: python
def normalized_extractor(request: Request) -> str:
api_key = request.headers.get("X-API-Key", "").strip().lower()
if api_key:
return f"key:{api_key}"
return f"ip:{request.client.host}"
Using with Middleware
---------------------
Key extractors work the same way with middleware:
.. code-block:: python
from fastapi_traffic.middleware import RateLimitMiddleware
app.add_middleware(
RateLimitMiddleware,
limit=1000,
window_size=60,
key_extractor=api_key_extractor,
)

View File

@@ -0,0 +1,322 @@
Middleware
==========
Sometimes you want rate limiting applied to your entire API, not just individual
endpoints. That's where middleware comes in.
Middleware sits between the client and your application, checking every request
before it reaches your endpoints.
Basic Usage
-----------
Add the middleware to your FastAPI app:
.. code-block:: python
from fastapi import FastAPI
from fastapi_traffic.middleware import RateLimitMiddleware
app = FastAPI()
app.add_middleware(
RateLimitMiddleware,
limit=1000, # 1000 requests
window_size=60, # per minute
)
@app.get("/api/users")
async def get_users():
return {"users": []}
@app.get("/api/posts")
async def get_posts():
return {"posts": []}
Now every endpoint shares the same rate limit pool. A client who makes 500 requests
to ``/api/users`` only has 500 left for ``/api/posts``.
Exempting Paths
---------------
You probably don't want to rate limit your health checks or documentation:
.. code-block:: python
app.add_middleware(
RateLimitMiddleware,
limit=1000,
window_size=60,
exempt_paths={
"/health",
"/ready",
"/docs",
"/redoc",
"/openapi.json",
},
)
These paths bypass rate limiting entirely.
Exempting IPs
-------------
Internal services, monitoring systems, or your own infrastructure might need
unrestricted access:
.. code-block:: python
app.add_middleware(
RateLimitMiddleware,
limit=1000,
window_size=60,
exempt_ips={
"127.0.0.1",
"10.0.0.0/8", # Internal network
"192.168.1.100", # Monitoring server
},
)
.. note::
IP exemptions are checked against the client IP extracted by the key extractor.
Make sure your proxy headers are configured correctly if you're behind a load
balancer.
Custom Key Extraction
---------------------
By default, clients are identified by IP address. You can change this:
.. code-block:: python
from starlette.requests import Request
def get_client_id(request: Request) -> str:
"""Identify clients by API key, fall back to IP."""
api_key = request.headers.get("X-API-Key")
if api_key:
return f"api:{api_key}"
return request.client.host if request.client else "unknown"
app.add_middleware(
RateLimitMiddleware,
limit=1000,
window_size=60,
key_extractor=get_client_id,
)
Choosing an Algorithm
---------------------
The middleware supports all five algorithms:
.. code-block:: python
from fastapi_traffic.core.algorithms import Algorithm
# Token bucket for burst-friendly limiting
app.add_middleware(
RateLimitMiddleware,
limit=1000,
window_size=60,
algorithm=Algorithm.TOKEN_BUCKET,
)
# Sliding window for precise limiting
app.add_middleware(
RateLimitMiddleware,
limit=1000,
window_size=60,
algorithm=Algorithm.SLIDING_WINDOW,
)
Using a Custom Backend
----------------------
By default, middleware uses the memory backend. For production, you'll want
something persistent:
.. code-block:: python
from fastapi_traffic import SQLiteBackend
from fastapi_traffic.middleware import RateLimitMiddleware
backend = SQLiteBackend("rate_limits.db")
app.add_middleware(
RateLimitMiddleware,
limit=1000,
window_size=60,
backend=backend,
)
@app.on_event("shutdown")
async def shutdown():
await backend.close()
For Redis:
.. code-block:: python
from fastapi_traffic.backends.redis import RedisBackend
# Create backend at startup
redis_backend = None
@app.on_event("startup")
async def startup():
global redis_backend
redis_backend = await RedisBackend.from_url("redis://localhost:6379/0")
# Note: You'll need to configure middleware after startup
# or use a factory pattern
Convenience Middleware Classes
------------------------------
For common use cases, we provide pre-configured middleware:
.. code-block:: python
from fastapi_traffic.middleware import (
SlidingWindowMiddleware,
TokenBucketMiddleware,
)
# Sliding window algorithm
app.add_middleware(
SlidingWindowMiddleware,
limit=1000,
window_size=60,
)
# Token bucket algorithm
app.add_middleware(
TokenBucketMiddleware,
limit=1000,
window_size=60,
)
Combining with Decorator
------------------------
You can use both middleware and decorators. The middleware provides a baseline
limit, and decorators can add stricter limits to specific endpoints:
.. code-block:: python
from fastapi_traffic import rate_limit
from fastapi_traffic.middleware import RateLimitMiddleware
# Global limit: 1000 req/min
app.add_middleware(
RateLimitMiddleware,
limit=1000,
window_size=60,
)
# This endpoint has an additional, stricter limit
@app.post("/api/expensive-operation")
@rate_limit(10, 60) # Only 10 req/min for this endpoint
async def expensive_operation(request: Request):
return {"result": "done"}
# This endpoint uses only the global limit
@app.get("/api/cheap-operation")
async def cheap_operation():
return {"result": "done"}
Both limits are checked. A request must pass both the middleware limit AND the
decorator limit.
Error Responses
---------------
When a client exceeds the rate limit, they get a 429 response:
.. code-block:: json
{
"detail": "Rate limit exceeded. Please try again later.",
"retry_after": 45.2
}
You can customize the message:
.. code-block:: python
app.add_middleware(
RateLimitMiddleware,
limit=1000,
window_size=60,
error_message="Whoa there! You're making requests too fast.",
status_code=429,
)
Response Headers
----------------
By default, rate limit headers are included in every response:
.. code-block:: http
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1709834400
When rate limited:
.. code-block:: http
Retry-After: 45
Disable headers if you don't want to expose this information:
.. code-block:: python
app.add_middleware(
RateLimitMiddleware,
limit=1000,
window_size=60,
include_headers=False,
)
Handling Backend Errors
-----------------------
What happens if your Redis server goes down? By default, the middleware will
raise an exception. You can change this behavior:
.. code-block:: python
app.add_middleware(
RateLimitMiddleware,
limit=1000,
window_size=60,
skip_on_error=True, # Allow requests through if backend fails
)
With ``skip_on_error=True``, requests are allowed through when the backend is
unavailable. This is a tradeoff between availability and protection.
Full Configuration Reference
----------------------------
.. code-block:: python
app.add_middleware(
RateLimitMiddleware,
limit=1000, # Max requests per window
window_size=60.0, # Window size in seconds
algorithm=Algorithm.SLIDING_WINDOW_COUNTER, # Algorithm to use
backend=None, # Storage backend (default: MemoryBackend)
key_prefix="middleware", # Prefix for rate limit keys
include_headers=True, # Add rate limit headers to responses
error_message="Rate limit exceeded. Please try again later.",
status_code=429, # HTTP status when limited
skip_on_error=False, # Allow requests if backend fails
exempt_paths=None, # Set of paths to exempt
exempt_ips=None, # Set of IPs to exempt
key_extractor=default_key_extractor, # Function to identify clients
)