release: bump version to 0.3.0
- Refactor Redis backend connection handling and pool management - Update algorithm implementations with improved type annotations - Enhance config loader validation with stricter Pydantic schemas - Improve decorator and middleware error handling - Expand example scripts with better docstrings and usage patterns - Add new 00_basic_usage.py example for quick start - Reorganize examples directory structure - Fix type annotation inconsistencies across core modules - Update dependencies in pyproject.toml
This commit is contained in:
290
docs/user-guide/algorithms.rst
Normal file
290
docs/user-guide/algorithms.rst
Normal file
@@ -0,0 +1,290 @@
|
||||
Rate Limiting Algorithms
|
||||
========================
|
||||
|
||||
FastAPI Traffic ships with five rate limiting algorithms. Each has its own strengths,
|
||||
and picking the right one depends on what you're trying to achieve.
|
||||
|
||||
This guide will help you understand the tradeoffs and choose wisely.
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
Here's the quick comparison:
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 20 40 40
|
||||
|
||||
* - Algorithm
|
||||
- Best For
|
||||
- Tradeoffs
|
||||
* - **Token Bucket**
|
||||
- APIs that need burst handling
|
||||
- Allows temporary spikes above average rate
|
||||
* - **Sliding Window**
|
||||
- Precise rate limiting
|
||||
- Higher memory usage
|
||||
* - **Fixed Window**
|
||||
- Simple, low-overhead limiting
|
||||
- Boundary issues (2x burst at window edges)
|
||||
* - **Leaky Bucket**
|
||||
- Consistent throughput
|
||||
- No burst handling
|
||||
* - **Sliding Window Counter**
|
||||
- General purpose (default)
|
||||
- Good balance of precision and efficiency
|
||||
|
||||
Token Bucket
|
||||
------------
|
||||
|
||||
Think of this as a bucket that holds tokens. Each request consumes a token, and
|
||||
tokens refill at a steady rate. If the bucket is empty, requests are rejected.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import rate_limit, Algorithm
|
||||
|
||||
@app.get("/api/data")
|
||||
@rate_limit(
|
||||
100, # 100 tokens refill per minute
|
||||
60,
|
||||
algorithm=Algorithm.TOKEN_BUCKET,
|
||||
burst_size=20, # bucket can hold up to 20 tokens
|
||||
)
|
||||
async def get_data(request: Request):
|
||||
return {"data": "here"}
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. The bucket starts full (at ``burst_size`` capacity)
|
||||
2. Each request removes one token
|
||||
3. Tokens refill at ``limit / window_size`` per second
|
||||
4. If no tokens are available, the request is rejected
|
||||
|
||||
**When to use it:**
|
||||
|
||||
- Your API has legitimate burst traffic (e.g., page loads that trigger multiple requests)
|
||||
- You want to allow short spikes while maintaining an average rate
|
||||
- Mobile apps that batch requests when coming online
|
||||
|
||||
**Example scenario:** A mobile app that syncs data when it reconnects. You want to
|
||||
allow it to catch up quickly, but not overwhelm your servers.
|
||||
|
||||
Sliding Window
|
||||
--------------
|
||||
|
||||
This algorithm tracks the exact timestamp of every request within the window. It's
|
||||
the most accurate approach, but uses more memory.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@app.get("/api/transactions")
|
||||
@rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW)
|
||||
async def get_transactions(request: Request):
|
||||
return {"transactions": []}
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. Every request timestamp is stored
|
||||
2. When checking, we count requests in the last ``window_size`` seconds
|
||||
3. Old timestamps are cleaned up automatically
|
||||
|
||||
**When to use it:**
|
||||
|
||||
- You need precise rate limiting (financial APIs, compliance requirements)
|
||||
- Memory isn't a major concern
|
||||
- The rate limit is relatively low (not millions of requests)
|
||||
|
||||
**Tradeoffs:**
|
||||
|
||||
- Memory usage grows with request volume
|
||||
- Slightly more CPU for timestamp management
|
||||
|
||||
Fixed Window
|
||||
------------
|
||||
|
||||
The simplest algorithm. Divide time into fixed windows (e.g., every minute) and
|
||||
count requests in each window.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@app.get("/api/simple")
|
||||
@rate_limit(100, 60, algorithm=Algorithm.FIXED_WINDOW)
|
||||
async def simple_endpoint(request: Request):
|
||||
return {"status": "ok"}
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. Time is divided into fixed windows (0:00-1:00, 1:00-2:00, etc.)
|
||||
2. Each request increments the counter for the current window
|
||||
3. When the window changes, the counter resets
|
||||
|
||||
**When to use it:**
|
||||
|
||||
- You want the simplest, most efficient option
|
||||
- Slight inaccuracy at window boundaries is acceptable
|
||||
- High-volume scenarios where memory matters
|
||||
|
||||
**The boundary problem:**
|
||||
|
||||
A client could make 100 requests at 0:59 and another 100 at 1:01, effectively
|
||||
getting 200 requests in 2 seconds. If this matters for your use case, use
|
||||
sliding window counter instead.
|
||||
|
||||
Leaky Bucket
|
||||
------------
|
||||
|
||||
Imagine a bucket with a hole in the bottom. Requests fill the bucket, and it
|
||||
"leaks" at a constant rate. If the bucket overflows, requests are rejected.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@app.get("/api/steady")
|
||||
@rate_limit(
|
||||
100,
|
||||
60,
|
||||
algorithm=Algorithm.LEAKY_BUCKET,
|
||||
burst_size=10, # bucket capacity
|
||||
)
|
||||
async def steady_endpoint(request: Request):
|
||||
return {"status": "ok"}
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. The bucket has a maximum capacity (``burst_size``)
|
||||
2. Each request adds "water" to the bucket
|
||||
3. Water leaks out at ``limit / window_size`` per second
|
||||
4. If the bucket would overflow, the request is rejected
|
||||
|
||||
**When to use it:**
|
||||
|
||||
- You need consistent, smooth throughput
|
||||
- Downstream systems can't handle bursts
|
||||
- Processing capacity is truly fixed (e.g., hardware limitations)
|
||||
|
||||
**Difference from token bucket:**
|
||||
|
||||
- Token bucket allows bursts up to the bucket size
|
||||
- Leaky bucket smooths out traffic to a constant rate
|
||||
|
||||
Sliding Window Counter
|
||||
----------------------
|
||||
|
||||
This is the default algorithm, and it's a good choice for most use cases. It
|
||||
combines the efficiency of fixed windows with better accuracy.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@app.get("/api/default")
|
||||
@rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW_COUNTER)
|
||||
async def default_endpoint(request: Request):
|
||||
return {"status": "ok"}
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. Maintains counters for the current and previous windows
|
||||
2. Calculates a weighted average based on how far into the current window we are
|
||||
3. At 30 seconds into a 60-second window: ``count = prev_count * 0.5 + curr_count``
|
||||
|
||||
**When to use it:**
|
||||
|
||||
- General purpose rate limiting
|
||||
- You want better accuracy than fixed window without the memory cost of sliding window
|
||||
- Most APIs fall into this category
|
||||
|
||||
**Why it's the default:**
|
||||
|
||||
It gives you 90% of the accuracy of sliding window with the memory efficiency of
|
||||
fixed window. Unless you have specific requirements, this is probably what you want.
|
||||
|
||||
Choosing the Right Algorithm
|
||||
----------------------------
|
||||
|
||||
Here's a decision tree:
|
||||
|
||||
1. **Do you need to allow bursts?**
|
||||
|
||||
- Yes → Token Bucket
|
||||
- No, I need smooth traffic → Leaky Bucket
|
||||
|
||||
2. **Do you need exact precision?**
|
||||
|
||||
- Yes, compliance/financial → Sliding Window
|
||||
- No, good enough is fine → Continue
|
||||
|
||||
3. **Is memory a concern?**
|
||||
|
||||
- Yes, high volume → Fixed Window
|
||||
- No → Sliding Window Counter (default)
|
||||
|
||||
Performance Comparison
|
||||
----------------------
|
||||
|
||||
All algorithms are O(1) for the check operation, but they differ in storage:
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - Algorithm
|
||||
- Storage per Key
|
||||
- Operations
|
||||
* - Token Bucket
|
||||
- 2 floats
|
||||
- 1 read, 1 write
|
||||
* - Sliding Window
|
||||
- N timestamps
|
||||
- 1 read, 1 write, cleanup
|
||||
* - Fixed Window
|
||||
- 1 int, 1 float
|
||||
- 1 read, 1 write
|
||||
* - Leaky Bucket
|
||||
- 2 floats
|
||||
- 1 read, 1 write
|
||||
* - Sliding Window Counter
|
||||
- 3 values
|
||||
- 1 read, 1 write
|
||||
|
||||
For most applications, the performance difference is negligible. Choose based on
|
||||
behavior, not performance, unless you're handling millions of requests per second.
|
||||
|
||||
Code Examples
|
||||
-------------
|
||||
|
||||
Here's a complete example showing all algorithms:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi import FastAPI, Request
|
||||
from fastapi_traffic import rate_limit, Algorithm
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
# Burst-friendly endpoint
|
||||
@app.get("/api/burst")
|
||||
@rate_limit(100, 60, algorithm=Algorithm.TOKEN_BUCKET, burst_size=25)
|
||||
async def burst_endpoint(request: Request):
|
||||
return {"type": "token_bucket"}
|
||||
|
||||
# Precise limiting
|
||||
@app.get("/api/precise")
|
||||
@rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW)
|
||||
async def precise_endpoint(request: Request):
|
||||
return {"type": "sliding_window"}
|
||||
|
||||
# Simple and efficient
|
||||
@app.get("/api/simple")
|
||||
@rate_limit(100, 60, algorithm=Algorithm.FIXED_WINDOW)
|
||||
async def simple_endpoint(request: Request):
|
||||
return {"type": "fixed_window"}
|
||||
|
||||
# Smooth throughput
|
||||
@app.get("/api/steady")
|
||||
@rate_limit(100, 60, algorithm=Algorithm.LEAKY_BUCKET)
|
||||
async def steady_endpoint(request: Request):
|
||||
return {"type": "leaky_bucket"}
|
||||
|
||||
# Best of both worlds (default)
|
||||
@app.get("/api/balanced")
|
||||
@rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW_COUNTER)
|
||||
async def balanced_endpoint(request: Request):
|
||||
return {"type": "sliding_window_counter"}
|
||||
312
docs/user-guide/backends.rst
Normal file
312
docs/user-guide/backends.rst
Normal file
@@ -0,0 +1,312 @@
|
||||
Storage Backends
|
||||
================
|
||||
|
||||
FastAPI Traffic needs somewhere to store rate limit state — how many requests each
|
||||
client has made, when their window resets, and so on. That's what backends are for.
|
||||
|
||||
You have three options, each suited to different deployment scenarios.
|
||||
|
||||
Choosing a Backend
|
||||
------------------
|
||||
|
||||
Here's the quick guide:
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 20 30 50
|
||||
|
||||
* - Backend
|
||||
- Use When
|
||||
- Limitations
|
||||
* - **Memory**
|
||||
- Development, single-process apps
|
||||
- Lost on restart, doesn't share across processes
|
||||
* - **SQLite**
|
||||
- Single-node production
|
||||
- Doesn't share across machines
|
||||
* - **Redis**
|
||||
- Distributed systems, multiple nodes
|
||||
- Requires Redis infrastructure
|
||||
|
||||
Memory Backend
|
||||
--------------
|
||||
|
||||
The default backend. It stores everything in memory using a dictionary with LRU
|
||||
eviction and automatic TTL cleanup.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import MemoryBackend, RateLimiter
|
||||
from fastapi_traffic.core.limiter import set_limiter
|
||||
|
||||
# This is what happens by default, but you can configure it:
|
||||
backend = MemoryBackend(
|
||||
max_size=10000, # Maximum number of keys to store
|
||||
cleanup_interval=60, # How often to clean expired entries (seconds)
|
||||
)
|
||||
limiter = RateLimiter(backend)
|
||||
set_limiter(limiter)
|
||||
|
||||
**When to use it:**
|
||||
|
||||
- Local development
|
||||
- Single-process applications
|
||||
- Testing and CI/CD pipelines
|
||||
- When you don't need persistence
|
||||
|
||||
**Limitations:**
|
||||
|
||||
- State is lost when the process restarts
|
||||
- Doesn't work with multiple workers (each worker has its own memory)
|
||||
- Not suitable for ``gunicorn`` with multiple workers or Kubernetes pods
|
||||
|
||||
**Memory management:**
|
||||
|
||||
The backend automatically evicts old entries when it hits ``max_size``. It uses
|
||||
LRU (Least Recently Used) eviction, so inactive clients get cleaned up first.
|
||||
|
||||
SQLite Backend
|
||||
--------------
|
||||
|
||||
For single-node production deployments where you need persistence. Rate limits
|
||||
survive restarts and work across multiple processes on the same machine.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import SQLiteBackend, RateLimiter
|
||||
from fastapi_traffic.core.limiter import set_limiter
|
||||
|
||||
backend = SQLiteBackend(
|
||||
"rate_limits.db", # Database file path
|
||||
cleanup_interval=300, # Clean expired entries every 5 minutes
|
||||
)
|
||||
limiter = RateLimiter(backend)
|
||||
set_limiter(limiter)
|
||||
|
||||
@app.on_event("startup")
|
||||
async def startup():
|
||||
await limiter.initialize()
|
||||
|
||||
@app.on_event("shutdown")
|
||||
async def shutdown():
|
||||
await limiter.close()
|
||||
|
||||
**When to use it:**
|
||||
|
||||
- Single-server deployments
|
||||
- When you need rate limits to survive restarts
|
||||
- Multiple workers on the same machine (gunicorn, uvicorn with workers)
|
||||
- When Redis is overkill for your use case
|
||||
|
||||
**Performance notes:**
|
||||
|
||||
- Uses WAL (Write-Ahead Logging) mode for better concurrent performance
|
||||
- Connection pooling is handled automatically
|
||||
- Writes are batched where possible
|
||||
|
||||
**File location:**
|
||||
|
||||
Put the database file somewhere persistent. For Docker deployments, mount a volume:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
# docker-compose.yml
|
||||
services:
|
||||
api:
|
||||
volumes:
|
||||
- ./data:/app/data
|
||||
environment:
|
||||
- RATE_LIMIT_DB=/app/data/rate_limits.db
|
||||
|
||||
Redis Backend
|
||||
-------------
|
||||
|
||||
The go-to choice for distributed systems. All your application instances share
|
||||
the same rate limit state.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import RateLimiter
|
||||
from fastapi_traffic.backends.redis import RedisBackend
|
||||
from fastapi_traffic.core.limiter import set_limiter
|
||||
|
||||
@app.on_event("startup")
|
||||
async def startup():
|
||||
backend = await RedisBackend.from_url(
|
||||
"redis://localhost:6379/0",
|
||||
key_prefix="myapp:ratelimit", # Optional prefix for all keys
|
||||
)
|
||||
limiter = RateLimiter(backend)
|
||||
set_limiter(limiter)
|
||||
await limiter.initialize()
|
||||
|
||||
@app.on_event("shutdown")
|
||||
async def shutdown():
|
||||
await limiter.close()
|
||||
|
||||
**When to use it:**
|
||||
|
||||
- Multiple application instances (Kubernetes, load-balanced servers)
|
||||
- When you need rate limits shared across your entire infrastructure
|
||||
- High-availability requirements
|
||||
|
||||
**Connection options:**
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Simple connection
|
||||
backend = await RedisBackend.from_url("redis://localhost:6379/0")
|
||||
|
||||
# With authentication
|
||||
backend = await RedisBackend.from_url("redis://:password@localhost:6379/0")
|
||||
|
||||
# Redis Sentinel for HA
|
||||
backend = await RedisBackend.from_url(
|
||||
"redis://sentinel1:26379/0",
|
||||
sentinel_master="mymaster",
|
||||
)
|
||||
|
||||
# Redis Cluster
|
||||
backend = await RedisBackend.from_url("redis://node1:6379,node2:6379,node3:6379/0")
|
||||
|
||||
**Atomic operations:**
|
||||
|
||||
The Redis backend uses Lua scripts to ensure atomic operations. This means rate
|
||||
limit checks are accurate even under high concurrency — no race conditions.
|
||||
|
||||
**Key expiration:**
|
||||
|
||||
Keys automatically expire based on the rate limit window. You don't need to worry
|
||||
about Redis filling up with stale data.
|
||||
|
||||
Switching Backends
|
||||
------------------
|
||||
|
||||
You can switch backends without changing your rate limiting code. Just configure
|
||||
a different backend at startup:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import os
|
||||
from fastapi_traffic import RateLimiter, MemoryBackend, SQLiteBackend
|
||||
from fastapi_traffic.core.limiter import set_limiter
|
||||
|
||||
def get_backend():
|
||||
"""Choose backend based on environment."""
|
||||
env = os.getenv("ENVIRONMENT", "development")
|
||||
|
||||
if env == "production":
|
||||
redis_url = os.getenv("REDIS_URL")
|
||||
if redis_url:
|
||||
from fastapi_traffic.backends.redis import RedisBackend
|
||||
return RedisBackend.from_url(redis_url)
|
||||
return SQLiteBackend("/app/data/rate_limits.db")
|
||||
|
||||
return MemoryBackend()
|
||||
|
||||
@app.on_event("startup")
|
||||
async def startup():
|
||||
backend = await get_backend()
|
||||
limiter = RateLimiter(backend)
|
||||
set_limiter(limiter)
|
||||
await limiter.initialize()
|
||||
|
||||
Custom Backends
|
||||
---------------
|
||||
|
||||
Need something different? Maybe you want to use PostgreSQL, DynamoDB, or some
|
||||
other storage system. You can implement your own backend:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic.backends.base import Backend
|
||||
from typing import Any
|
||||
|
||||
class MyCustomBackend(Backend):
|
||||
async def get(self, key: str) -> dict[str, Any] | None:
|
||||
"""Retrieve state for a key."""
|
||||
# Your implementation here
|
||||
pass
|
||||
|
||||
async def set(self, key: str, value: dict[str, Any], *, ttl: float) -> None:
|
||||
"""Store state with TTL."""
|
||||
pass
|
||||
|
||||
async def delete(self, key: str) -> None:
|
||||
"""Delete a key."""
|
||||
pass
|
||||
|
||||
async def exists(self, key: str) -> bool:
|
||||
"""Check if key exists."""
|
||||
pass
|
||||
|
||||
async def increment(self, key: str, amount: int = 1) -> int:
|
||||
"""Atomically increment a counter."""
|
||||
pass
|
||||
|
||||
async def clear(self) -> None:
|
||||
"""Clear all data."""
|
||||
pass
|
||||
|
||||
async def close(self) -> None:
|
||||
"""Clean up resources."""
|
||||
pass
|
||||
|
||||
The key methods are ``get``, ``set``, and ``delete``. The state is stored as a
|
||||
dictionary, and the backend is responsible for serialization.
|
||||
|
||||
Backend Comparison
|
||||
------------------
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - Feature
|
||||
- Memory
|
||||
- SQLite
|
||||
- Redis
|
||||
* - Persistence
|
||||
- ❌
|
||||
- ✅
|
||||
- ✅
|
||||
* - Multi-process
|
||||
- ❌
|
||||
- ✅
|
||||
- ✅
|
||||
* - Multi-node
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
* - Setup complexity
|
||||
- None
|
||||
- Low
|
||||
- Medium
|
||||
* - Latency
|
||||
- ~0.01ms
|
||||
- ~0.1ms
|
||||
- ~1ms
|
||||
* - Dependencies
|
||||
- None
|
||||
- None
|
||||
- redis package
|
||||
|
||||
Best Practices
|
||||
--------------
|
||||
|
||||
1. **Start with Memory, upgrade when needed.** Don't over-engineer. Memory is
|
||||
fine for development and many production scenarios.
|
||||
|
||||
2. **Use Redis for distributed systems.** If you have multiple application
|
||||
instances, Redis is the only option that works correctly.
|
||||
|
||||
3. **Handle backend errors gracefully.** Set ``skip_on_error=True`` if you'd
|
||||
rather allow requests through than fail when the backend is down:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@rate_limit(100, 60, skip_on_error=True)
|
||||
async def endpoint(request: Request):
|
||||
return {"status": "ok"}
|
||||
|
||||
4. **Monitor your backend.** Keep an eye on memory usage (Memory backend),
|
||||
disk space (SQLite), or Redis memory and connections.
|
||||
315
docs/user-guide/configuration.rst
Normal file
315
docs/user-guide/configuration.rst
Normal file
@@ -0,0 +1,315 @@
|
||||
Configuration
|
||||
=============
|
||||
|
||||
FastAPI Traffic supports loading configuration from environment variables and files.
|
||||
This makes it easy to manage settings across different environments without changing code.
|
||||
|
||||
Configuration Loader
|
||||
--------------------
|
||||
|
||||
The ``ConfigLoader`` class handles loading configuration from various sources:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import ConfigLoader, RateLimitConfig
|
||||
|
||||
loader = ConfigLoader()
|
||||
|
||||
# Load from environment variables
|
||||
config = loader.load_rate_limit_config_from_env()
|
||||
|
||||
# Load from a JSON file
|
||||
config = loader.load_rate_limit_config_from_json("config/rate_limits.json")
|
||||
|
||||
# Load from a .env file
|
||||
config = loader.load_rate_limit_config_from_env_file(".env")
|
||||
|
||||
Environment Variables
|
||||
---------------------
|
||||
|
||||
Set rate limit configuration using environment variables with the ``FASTAPI_TRAFFIC_``
|
||||
prefix:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Basic settings
|
||||
export FASTAPI_TRAFFIC_RATE_LIMIT_LIMIT=100
|
||||
export FASTAPI_TRAFFIC_RATE_LIMIT_WINDOW_SIZE=60
|
||||
export FASTAPI_TRAFFIC_RATE_LIMIT_ALGORITHM=sliding_window_counter
|
||||
|
||||
# Optional settings
|
||||
export FASTAPI_TRAFFIC_RATE_LIMIT_KEY_PREFIX=myapp
|
||||
export FASTAPI_TRAFFIC_RATE_LIMIT_BURST_SIZE=20
|
||||
export FASTAPI_TRAFFIC_RATE_LIMIT_INCLUDE_HEADERS=true
|
||||
export FASTAPI_TRAFFIC_RATE_LIMIT_ERROR_MESSAGE="Too many requests"
|
||||
export FASTAPI_TRAFFIC_RATE_LIMIT_STATUS_CODE=429
|
||||
export FASTAPI_TRAFFIC_RATE_LIMIT_SKIP_ON_ERROR=false
|
||||
export FASTAPI_TRAFFIC_RATE_LIMIT_COST=1
|
||||
|
||||
Then load them in your app:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import load_rate_limit_config_from_env, rate_limit
|
||||
|
||||
# Load config from environment
|
||||
config = load_rate_limit_config_from_env()
|
||||
|
||||
# Use it with the decorator
|
||||
@app.get("/api/data")
|
||||
@rate_limit(config.limit, config.window_size, algorithm=config.algorithm)
|
||||
async def get_data(request: Request):
|
||||
return {"data": "here"}
|
||||
|
||||
Custom Prefix
|
||||
-------------
|
||||
|
||||
If ``FASTAPI_TRAFFIC_`` conflicts with something else, use a custom prefix:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
loader = ConfigLoader(prefix="MYAPP_RATELIMIT")
|
||||
config = loader.load_rate_limit_config_from_env()
|
||||
|
||||
# Now reads from:
|
||||
# MYAPP_RATELIMIT_RATE_LIMIT_LIMIT=100
|
||||
# MYAPP_RATELIMIT_RATE_LIMIT_WINDOW_SIZE=60
|
||||
# etc.
|
||||
|
||||
JSON Configuration
|
||||
------------------
|
||||
|
||||
For more complex setups, use a JSON file:
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
{
|
||||
"limit": 100,
|
||||
"window_size": 60,
|
||||
"algorithm": "token_bucket",
|
||||
"burst_size": 25,
|
||||
"key_prefix": "api",
|
||||
"include_headers": true,
|
||||
"error_message": "Rate limit exceeded. Please slow down.",
|
||||
"status_code": 429,
|
||||
"skip_on_error": false,
|
||||
"cost": 1
|
||||
}
|
||||
|
||||
Load it:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import ConfigLoader
|
||||
|
||||
loader = ConfigLoader()
|
||||
config = loader.load_rate_limit_config_from_json("config/rate_limits.json")
|
||||
|
||||
.env Files
|
||||
----------
|
||||
|
||||
You can also use ``.env`` files, which is handy for local development:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# .env
|
||||
FASTAPI_TRAFFIC_RATE_LIMIT_LIMIT=100
|
||||
FASTAPI_TRAFFIC_RATE_LIMIT_WINDOW_SIZE=60
|
||||
FASTAPI_TRAFFIC_RATE_LIMIT_ALGORITHM=sliding_window
|
||||
|
||||
Load it:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
loader = ConfigLoader()
|
||||
config = loader.load_rate_limit_config_from_env_file(".env")
|
||||
|
||||
Global Configuration
|
||||
--------------------
|
||||
|
||||
Besides per-endpoint configuration, you can set global defaults:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Global settings
|
||||
export FASTAPI_TRAFFIC_GLOBAL_ENABLED=true
|
||||
export FASTAPI_TRAFFIC_GLOBAL_DEFAULT_LIMIT=100
|
||||
export FASTAPI_TRAFFIC_GLOBAL_DEFAULT_WINDOW_SIZE=60
|
||||
export FASTAPI_TRAFFIC_GLOBAL_DEFAULT_ALGORITHM=sliding_window_counter
|
||||
export FASTAPI_TRAFFIC_GLOBAL_KEY_PREFIX=fastapi_traffic
|
||||
export FASTAPI_TRAFFIC_GLOBAL_INCLUDE_HEADERS=true
|
||||
export FASTAPI_TRAFFIC_GLOBAL_ERROR_MESSAGE="Rate limit exceeded"
|
||||
export FASTAPI_TRAFFIC_GLOBAL_STATUS_CODE=429
|
||||
export FASTAPI_TRAFFIC_GLOBAL_SKIP_ON_ERROR=false
|
||||
export FASTAPI_TRAFFIC_GLOBAL_HEADERS_PREFIX=X-RateLimit
|
||||
|
||||
Load global config:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import load_global_config_from_env, RateLimiter
|
||||
from fastapi_traffic.core.limiter import set_limiter
|
||||
|
||||
global_config = load_global_config_from_env()
|
||||
limiter = RateLimiter(config=global_config)
|
||||
set_limiter(limiter)
|
||||
|
||||
Auto-Detection
|
||||
--------------
|
||||
|
||||
The convenience functions automatically detect file format:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import load_rate_limit_config, load_global_config
|
||||
|
||||
# Detects JSON by extension
|
||||
config = load_rate_limit_config("config/limits.json")
|
||||
|
||||
# Detects .env file
|
||||
config = load_rate_limit_config("config/.env")
|
||||
|
||||
# Works for global config too
|
||||
global_config = load_global_config("config/global.json")
|
||||
|
||||
Overriding Values
|
||||
-----------------
|
||||
|
||||
You can override loaded values programmatically:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
loader = ConfigLoader()
|
||||
|
||||
# Load base config from file
|
||||
config = loader.load_rate_limit_config_from_json(
|
||||
"config/base.json",
|
||||
limit=200, # Override the limit
|
||||
key_prefix="custom", # Override the prefix
|
||||
)
|
||||
|
||||
This is useful for environment-specific overrides:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import os
|
||||
|
||||
base_config = loader.load_rate_limit_config_from_json("config/base.json")
|
||||
|
||||
# Apply environment-specific overrides
|
||||
if os.getenv("ENVIRONMENT") == "production":
|
||||
config = loader.load_rate_limit_config_from_json(
|
||||
"config/base.json",
|
||||
limit=base_config.limit * 2, # Double the limit in production
|
||||
)
|
||||
|
||||
Validation
|
||||
----------
|
||||
|
||||
Configuration is validated when loaded. Invalid values raise ``ConfigurationError``:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import ConfigLoader, ConfigurationError
|
||||
|
||||
loader = ConfigLoader()
|
||||
|
||||
try:
|
||||
config = loader.load_rate_limit_config_from_env()
|
||||
except ConfigurationError as e:
|
||||
print(f"Invalid configuration: {e}")
|
||||
# Handle the error appropriately
|
||||
|
||||
Common validation errors:
|
||||
|
||||
- ``limit`` must be a positive integer
|
||||
- ``window_size`` must be a positive number
|
||||
- ``algorithm`` must be one of the valid algorithm names
|
||||
- ``status_code`` must be a valid HTTP status code
|
||||
|
||||
Algorithm Names
|
||||
---------------
|
||||
|
||||
When specifying algorithms in configuration, use these names:
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - Config Value
|
||||
- Algorithm
|
||||
* - ``token_bucket``
|
||||
- Token Bucket
|
||||
* - ``sliding_window``
|
||||
- Sliding Window
|
||||
* - ``fixed_window``
|
||||
- Fixed Window
|
||||
* - ``leaky_bucket``
|
||||
- Leaky Bucket
|
||||
* - ``sliding_window_counter``
|
||||
- Sliding Window Counter (default)
|
||||
|
||||
Boolean Values
|
||||
--------------
|
||||
|
||||
Boolean settings accept various formats:
|
||||
|
||||
- **True:** ``true``, ``1``, ``yes``, ``on``
|
||||
- **False:** ``false``, ``0``, ``no``, ``off``
|
||||
|
||||
Case doesn't matter.
|
||||
|
||||
Complete Example
|
||||
----------------
|
||||
|
||||
Here's a full example showing configuration loading in a real app:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import os
|
||||
from fastapi import FastAPI, Request
|
||||
from fastapi_traffic import (
|
||||
ConfigLoader,
|
||||
ConfigurationError,
|
||||
RateLimiter,
|
||||
rate_limit,
|
||||
)
|
||||
from fastapi_traffic.core.limiter import set_limiter
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
@app.on_event("startup")
|
||||
async def startup():
|
||||
loader = ConfigLoader()
|
||||
|
||||
try:
|
||||
# Try to load from environment first
|
||||
global_config = loader.load_global_config_from_env()
|
||||
except ConfigurationError:
|
||||
# Fall back to defaults
|
||||
global_config = None
|
||||
|
||||
limiter = RateLimiter(config=global_config)
|
||||
set_limiter(limiter)
|
||||
await limiter.initialize()
|
||||
|
||||
@app.get("/api/data")
|
||||
@rate_limit(100, 60)
|
||||
async def get_data(request: Request):
|
||||
return {"data": "here"}
|
||||
|
||||
# Or load endpoint-specific config
|
||||
loader = ConfigLoader()
|
||||
try:
|
||||
api_config = loader.load_rate_limit_config_from_json("config/api_limits.json")
|
||||
except (FileNotFoundError, ConfigurationError):
|
||||
api_config = None
|
||||
|
||||
if api_config:
|
||||
@app.get("/api/special")
|
||||
@rate_limit(
|
||||
api_config.limit,
|
||||
api_config.window_size,
|
||||
algorithm=api_config.algorithm,
|
||||
)
|
||||
async def special_endpoint(request: Request):
|
||||
return {"special": "data"}
|
||||
277
docs/user-guide/exception-handling.rst
Normal file
277
docs/user-guide/exception-handling.rst
Normal file
@@ -0,0 +1,277 @@
|
||||
Exception Handling
|
||||
==================
|
||||
|
||||
When a client exceeds their rate limit, FastAPI Traffic raises a ``RateLimitExceeded``
|
||||
exception. This guide covers how to handle it gracefully.
|
||||
|
||||
Default Behavior
|
||||
----------------
|
||||
|
||||
By default, when a rate limit is exceeded, the library raises ``RateLimitExceeded``.
|
||||
FastAPI will convert this to a 500 error unless you handle it.
|
||||
|
||||
The exception contains useful information:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import RateLimitExceeded
|
||||
|
||||
try:
|
||||
# Rate limited operation
|
||||
pass
|
||||
except RateLimitExceeded as exc:
|
||||
print(exc.message) # "Rate limit exceeded"
|
||||
print(exc.retry_after) # Seconds until they can retry (e.g., 45.2)
|
||||
print(exc.limit_info) # RateLimitInfo object with full details
|
||||
|
||||
Custom Exception Handler
|
||||
------------------------
|
||||
|
||||
The most common approach is to register a custom exception handler:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi import FastAPI, Request
|
||||
from fastapi.responses import JSONResponse
|
||||
from fastapi_traffic import RateLimitExceeded
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
@app.exception_handler(RateLimitExceeded)
|
||||
async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
|
||||
return JSONResponse(
|
||||
status_code=429,
|
||||
content={
|
||||
"error": "rate_limit_exceeded",
|
||||
"message": "You're making too many requests. Please slow down.",
|
||||
"retry_after": exc.retry_after,
|
||||
},
|
||||
headers={
|
||||
"Retry-After": str(int(exc.retry_after or 60)),
|
||||
},
|
||||
)
|
||||
|
||||
Now clients get a clean JSON response instead of a generic error.
|
||||
|
||||
Including Rate Limit Headers
|
||||
----------------------------
|
||||
|
||||
The ``limit_info`` object can generate standard rate limit headers:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@app.exception_handler(RateLimitExceeded)
|
||||
async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
|
||||
headers = {}
|
||||
if exc.limit_info:
|
||||
headers = exc.limit_info.to_headers()
|
||||
|
||||
return JSONResponse(
|
||||
status_code=429,
|
||||
content={
|
||||
"error": "rate_limit_exceeded",
|
||||
"retry_after": exc.retry_after,
|
||||
},
|
||||
headers=headers,
|
||||
)
|
||||
|
||||
This adds headers like:
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
X-RateLimit-Limit: 100
|
||||
X-RateLimit-Remaining: 0
|
||||
X-RateLimit-Reset: 1709834400
|
||||
Retry-After: 45
|
||||
|
||||
Different Responses for Different Endpoints
|
||||
-------------------------------------------
|
||||
|
||||
You might want different error messages for different parts of your API:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@app.exception_handler(RateLimitExceeded)
|
||||
async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
|
||||
path = request.url.path
|
||||
|
||||
if path.startswith("/api/v1/"):
|
||||
# API clients get JSON
|
||||
return JSONResponse(
|
||||
status_code=429,
|
||||
content={"error": "rate_limit_exceeded", "retry_after": exc.retry_after},
|
||||
)
|
||||
elif path.startswith("/web/"):
|
||||
# Web users get a friendly HTML page
|
||||
return HTMLResponse(
|
||||
status_code=429,
|
||||
content="<h1>Slow down!</h1><p>Please wait a moment before trying again.</p>",
|
||||
)
|
||||
else:
|
||||
# Default response
|
||||
return JSONResponse(
|
||||
status_code=429,
|
||||
content={"detail": exc.message},
|
||||
)
|
||||
|
||||
Using the on_blocked Callback
|
||||
-----------------------------
|
||||
|
||||
Instead of (or in addition to) exception handling, you can use the ``on_blocked``
|
||||
callback to run code when a request is blocked:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def log_blocked_request(request: Request, result):
|
||||
"""Log when a request is rate limited."""
|
||||
client_ip = request.client.host if request.client else "unknown"
|
||||
logger.warning(
|
||||
"Rate limit exceeded for %s on %s %s",
|
||||
client_ip,
|
||||
request.method,
|
||||
request.url.path,
|
||||
)
|
||||
|
||||
@app.get("/api/data")
|
||||
@rate_limit(100, 60, on_blocked=log_blocked_request)
|
||||
async def get_data(request: Request):
|
||||
return {"data": "here"}
|
||||
|
||||
The callback receives the request and the rate limit result. It runs before the
|
||||
exception is raised.
|
||||
|
||||
Exempting Certain Requests
|
||||
--------------------------
|
||||
|
||||
Use ``exempt_when`` to skip rate limiting for certain requests:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def is_admin(request: Request) -> bool:
|
||||
"""Check if request is from an admin."""
|
||||
user = getattr(request.state, "user", None)
|
||||
return user is not None and user.is_admin
|
||||
|
||||
@app.get("/api/data")
|
||||
@rate_limit(100, 60, exempt_when=is_admin)
|
||||
async def get_data(request: Request):
|
||||
return {"data": "here"}
|
||||
|
||||
Admin requests bypass rate limiting entirely.
|
||||
|
||||
Graceful Degradation
|
||||
--------------------
|
||||
|
||||
Sometimes you'd rather serve a degraded response than reject the request entirely:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import RateLimiter, RateLimitConfig
|
||||
from fastapi_traffic.core.limiter import get_limiter
|
||||
|
||||
@app.get("/api/search")
|
||||
async def search(request: Request, q: str):
|
||||
limiter = get_limiter()
|
||||
config = RateLimitConfig(limit=100, window_size=60)
|
||||
|
||||
result = await limiter.check(request, config)
|
||||
|
||||
if not result.allowed:
|
||||
# Return cached/simplified results instead of blocking
|
||||
return {
|
||||
"results": get_cached_results(q),
|
||||
"note": "Results may be stale. Please try again later.",
|
||||
"retry_after": result.info.retry_after,
|
||||
}
|
||||
|
||||
# Full search
|
||||
return {"results": perform_full_search(q)}
|
||||
|
||||
Backend Errors
|
||||
--------------
|
||||
|
||||
If the rate limit backend fails (Redis down, SQLite locked, etc.), you have options:
|
||||
|
||||
**Option 1: Fail closed (default)**
|
||||
|
||||
Requests fail when the backend is unavailable. Safer, but impacts availability.
|
||||
|
||||
**Option 2: Fail open**
|
||||
|
||||
Allow requests through when the backend fails:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@app.get("/api/data")
|
||||
@rate_limit(100, 60, skip_on_error=True)
|
||||
async def get_data(request: Request):
|
||||
return {"data": "here"}
|
||||
|
||||
**Option 3: Handle the error explicitly**
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import BackendError
|
||||
|
||||
@app.exception_handler(BackendError)
|
||||
async def backend_error_handler(request: Request, exc: BackendError):
|
||||
# Log the error
|
||||
logger.error("Rate limit backend error: %s", exc.original_error)
|
||||
|
||||
# Decide what to do
|
||||
# Option A: Allow the request
|
||||
return None # Let the request continue
|
||||
|
||||
# Option B: Return an error
|
||||
return JSONResponse(
|
||||
status_code=503,
|
||||
content={"error": "service_unavailable"},
|
||||
)
|
||||
|
||||
Other Exceptions
|
||||
----------------
|
||||
|
||||
FastAPI Traffic defines a few exception types:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import (
|
||||
RateLimitExceeded, # Rate limit was exceeded
|
||||
BackendError, # Storage backend failed
|
||||
ConfigurationError, # Invalid configuration
|
||||
)
|
||||
|
||||
All inherit from ``FastAPITrafficError``:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic.exceptions import FastAPITrafficError
|
||||
|
||||
@app.exception_handler(FastAPITrafficError)
|
||||
async def traffic_error_handler(request: Request, exc: FastAPITrafficError):
|
||||
"""Catch-all for FastAPI Traffic errors."""
|
||||
if isinstance(exc, RateLimitExceeded):
|
||||
return JSONResponse(status_code=429, content={"error": "rate_limited"})
|
||||
elif isinstance(exc, BackendError):
|
||||
return JSONResponse(status_code=503, content={"error": "backend_error"})
|
||||
else:
|
||||
return JSONResponse(status_code=500, content={"error": "internal_error"})
|
||||
|
||||
Helper Function
|
||||
---------------
|
||||
|
||||
FastAPI Traffic provides a helper to create rate limit responses:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic.core.decorator import create_rate_limit_response
|
||||
|
||||
@app.exception_handler(RateLimitExceeded)
|
||||
async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
|
||||
return create_rate_limit_response(exc, include_headers=True)
|
||||
|
||||
This creates a standard 429 response with all the appropriate headers.
|
||||
258
docs/user-guide/key-extractors.rst
Normal file
258
docs/user-guide/key-extractors.rst
Normal file
@@ -0,0 +1,258 @@
|
||||
Key Extractors
|
||||
==============
|
||||
|
||||
A key extractor is a function that identifies who's making a request. By default,
|
||||
FastAPI Traffic uses the client's IP address, but you can customize this to fit
|
||||
your authentication model.
|
||||
|
||||
How It Works
|
||||
------------
|
||||
|
||||
Every rate limit needs a way to group requests. The key extractor returns a string
|
||||
that identifies the client:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def my_key_extractor(request: Request) -> str:
|
||||
return "some-unique-identifier"
|
||||
|
||||
All requests that return the same identifier share the same rate limit bucket.
|
||||
|
||||
Default Behavior
|
||||
----------------
|
||||
|
||||
The default extractor looks for the client IP in this order:
|
||||
|
||||
1. ``X-Forwarded-For`` header (first IP in the list)
|
||||
2. ``X-Real-IP`` header
|
||||
3. Direct connection IP (``request.client.host``)
|
||||
4. Falls back to ``"unknown"``
|
||||
|
||||
This handles most reverse proxy setups automatically.
|
||||
|
||||
Rate Limiting by API Key
|
||||
------------------------
|
||||
|
||||
For authenticated APIs, you probably want to limit by API key:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi import Request
|
||||
from fastapi_traffic import rate_limit
|
||||
|
||||
def api_key_extractor(request: Request) -> str:
|
||||
"""Rate limit by API key."""
|
||||
api_key = request.headers.get("X-API-Key")
|
||||
if api_key:
|
||||
return f"apikey:{api_key}"
|
||||
# Fall back to IP for unauthenticated requests
|
||||
return f"ip:{request.client.host}" if request.client else "ip:unknown"
|
||||
|
||||
@app.get("/api/data")
|
||||
@rate_limit(1000, 3600, key_extractor=api_key_extractor)
|
||||
async def get_data(request: Request):
|
||||
return {"data": "here"}
|
||||
|
||||
Now each API key gets its own rate limit bucket.
|
||||
|
||||
Rate Limiting by User
|
||||
---------------------
|
||||
|
||||
If you're using authentication middleware that sets the user:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def user_extractor(request: Request) -> str:
|
||||
"""Rate limit by authenticated user."""
|
||||
# Assuming your auth middleware sets request.state.user
|
||||
user = getattr(request.state, "user", None)
|
||||
if user:
|
||||
return f"user:{user.id}"
|
||||
return f"ip:{request.client.host}" if request.client else "ip:unknown"
|
||||
|
||||
@app.get("/api/profile")
|
||||
@rate_limit(100, 60, key_extractor=user_extractor)
|
||||
async def get_profile(request: Request):
|
||||
return {"profile": "data"}
|
||||
|
||||
Rate Limiting by Tenant
|
||||
-----------------------
|
||||
|
||||
For multi-tenant applications:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def tenant_extractor(request: Request) -> str:
|
||||
"""Rate limit by tenant."""
|
||||
# From subdomain
|
||||
host = request.headers.get("host", "")
|
||||
if "." in host:
|
||||
tenant = host.split(".")[0]
|
||||
return f"tenant:{tenant}"
|
||||
|
||||
# Or from header
|
||||
tenant = request.headers.get("X-Tenant-ID")
|
||||
if tenant:
|
||||
return f"tenant:{tenant}"
|
||||
|
||||
return "tenant:default"
|
||||
|
||||
Combining Identifiers
|
||||
---------------------
|
||||
|
||||
Sometimes you want to combine multiple factors:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def combined_extractor(request: Request) -> str:
|
||||
"""Rate limit by user AND endpoint."""
|
||||
user = getattr(request.state, "user", None)
|
||||
user_id = user.id if user else "anonymous"
|
||||
endpoint = request.url.path
|
||||
return f"{user_id}:{endpoint}"
|
||||
|
||||
This gives each user a separate limit for each endpoint.
|
||||
|
||||
Tiered Rate Limits
|
||||
------------------
|
||||
|
||||
Different users might have different limits. Handle this with a custom extractor
|
||||
that includes the tier:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def tiered_extractor(request: Request) -> str:
|
||||
"""Include tier in the key for different limits."""
|
||||
user = getattr(request.state, "user", None)
|
||||
if user:
|
||||
# Premium users get a different bucket
|
||||
tier = "premium" if user.is_premium else "free"
|
||||
return f"{tier}:{user.id}"
|
||||
return f"anonymous:{request.client.host}"
|
||||
|
||||
Then apply different limits based on tier:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# You'd typically do this with middleware or dependency injection
|
||||
# to check the tier and apply the appropriate limit
|
||||
|
||||
@app.get("/api/data")
|
||||
async def get_data(request: Request):
|
||||
user = getattr(request.state, "user", None)
|
||||
if user and user.is_premium:
|
||||
# Premium: 10000 req/hour
|
||||
limit, window = 10000, 3600
|
||||
else:
|
||||
# Free: 100 req/hour
|
||||
limit, window = 100, 3600
|
||||
|
||||
# Apply rate limit manually
|
||||
limiter = get_limiter()
|
||||
config = RateLimitConfig(limit=limit, window_size=window)
|
||||
await limiter.hit(request, config)
|
||||
|
||||
return {"data": "here"}
|
||||
|
||||
Geographic Rate Limiting
|
||||
------------------------
|
||||
|
||||
Limit by country or region:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def geo_extractor(request: Request) -> str:
|
||||
"""Rate limit by country."""
|
||||
# Assuming you have a GeoIP lookup
|
||||
country = request.headers.get("CF-IPCountry", "XX") # Cloudflare header
|
||||
ip = request.client.host if request.client else "unknown"
|
||||
return f"{country}:{ip}"
|
||||
|
||||
This lets you apply different limits to different regions if needed.
|
||||
|
||||
Endpoint-Specific Keys
|
||||
----------------------
|
||||
|
||||
Rate limit the same user differently per endpoint:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def endpoint_user_extractor(request: Request) -> str:
|
||||
"""Separate limits per endpoint per user."""
|
||||
user = getattr(request.state, "user", None)
|
||||
user_id = user.id if user else request.client.host
|
||||
method = request.method
|
||||
path = request.url.path
|
||||
return f"{user_id}:{method}:{path}"
|
||||
|
||||
Best Practices
|
||||
--------------
|
||||
|
||||
1. **Always have a fallback.** If your primary identifier isn't available, fall
|
||||
back to IP:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def safe_extractor(request: Request) -> str:
|
||||
api_key = request.headers.get("X-API-Key")
|
||||
if api_key:
|
||||
return f"key:{api_key}"
|
||||
return f"ip:{request.client.host if request.client else 'unknown'}"
|
||||
|
||||
2. **Use prefixes.** When mixing identifier types, prefix them to avoid collisions:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Good - clear what each key represents
|
||||
return f"user:{user_id}"
|
||||
return f"ip:{ip_address}"
|
||||
return f"key:{api_key}"
|
||||
|
||||
# Bad - could collide
|
||||
return user_id
|
||||
return ip_address
|
||||
|
||||
3. **Keep it fast.** The extractor runs on every request. Avoid database lookups
|
||||
or expensive operations:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Bad - database lookup on every request
|
||||
def slow_extractor(request: Request) -> str:
|
||||
user = db.get_user(request.headers.get("Authorization"))
|
||||
return user.id
|
||||
|
||||
# Good - use data already in the request
|
||||
def fast_extractor(request: Request) -> str:
|
||||
return request.state.user.id # Set by auth middleware
|
||||
|
||||
4. **Be consistent.** The same client should always get the same key. Watch out
|
||||
for things like:
|
||||
|
||||
- IP addresses changing (mobile users)
|
||||
- Case sensitivity (normalize to lowercase)
|
||||
- Whitespace (strip it)
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def normalized_extractor(request: Request) -> str:
|
||||
api_key = request.headers.get("X-API-Key", "").strip().lower()
|
||||
if api_key:
|
||||
return f"key:{api_key}"
|
||||
return f"ip:{request.client.host}"
|
||||
|
||||
Using with Middleware
|
||||
---------------------
|
||||
|
||||
Key extractors work the same way with middleware:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic.middleware import RateLimitMiddleware
|
||||
|
||||
app.add_middleware(
|
||||
RateLimitMiddleware,
|
||||
limit=1000,
|
||||
window_size=60,
|
||||
key_extractor=api_key_extractor,
|
||||
)
|
||||
322
docs/user-guide/middleware.rst
Normal file
322
docs/user-guide/middleware.rst
Normal file
@@ -0,0 +1,322 @@
|
||||
Middleware
|
||||
==========
|
||||
|
||||
Sometimes you want rate limiting applied to your entire API, not just individual
|
||||
endpoints. That's where middleware comes in.
|
||||
|
||||
Middleware sits between the client and your application, checking every request
|
||||
before it reaches your endpoints.
|
||||
|
||||
Basic Usage
|
||||
-----------
|
||||
|
||||
Add the middleware to your FastAPI app:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi import FastAPI
|
||||
from fastapi_traffic.middleware import RateLimitMiddleware
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
app.add_middleware(
|
||||
RateLimitMiddleware,
|
||||
limit=1000, # 1000 requests
|
||||
window_size=60, # per minute
|
||||
)
|
||||
|
||||
@app.get("/api/users")
|
||||
async def get_users():
|
||||
return {"users": []}
|
||||
|
||||
@app.get("/api/posts")
|
||||
async def get_posts():
|
||||
return {"posts": []}
|
||||
|
||||
Now every endpoint shares the same rate limit pool. A client who makes 500 requests
|
||||
to ``/api/users`` only has 500 left for ``/api/posts``.
|
||||
|
||||
Exempting Paths
|
||||
---------------
|
||||
|
||||
You probably don't want to rate limit your health checks or documentation:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
app.add_middleware(
|
||||
RateLimitMiddleware,
|
||||
limit=1000,
|
||||
window_size=60,
|
||||
exempt_paths={
|
||||
"/health",
|
||||
"/ready",
|
||||
"/docs",
|
||||
"/redoc",
|
||||
"/openapi.json",
|
||||
},
|
||||
)
|
||||
|
||||
These paths bypass rate limiting entirely.
|
||||
|
||||
Exempting IPs
|
||||
-------------
|
||||
|
||||
Internal services, monitoring systems, or your own infrastructure might need
|
||||
unrestricted access:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
app.add_middleware(
|
||||
RateLimitMiddleware,
|
||||
limit=1000,
|
||||
window_size=60,
|
||||
exempt_ips={
|
||||
"127.0.0.1",
|
||||
"10.0.0.0/8", # Internal network
|
||||
"192.168.1.100", # Monitoring server
|
||||
},
|
||||
)
|
||||
|
||||
.. note::
|
||||
|
||||
IP exemptions are checked against the client IP extracted by the key extractor.
|
||||
Make sure your proxy headers are configured correctly if you're behind a load
|
||||
balancer.
|
||||
|
||||
Custom Key Extraction
|
||||
---------------------
|
||||
|
||||
By default, clients are identified by IP address. You can change this:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from starlette.requests import Request
|
||||
|
||||
def get_client_id(request: Request) -> str:
|
||||
"""Identify clients by API key, fall back to IP."""
|
||||
api_key = request.headers.get("X-API-Key")
|
||||
if api_key:
|
||||
return f"api:{api_key}"
|
||||
return request.client.host if request.client else "unknown"
|
||||
|
||||
app.add_middleware(
|
||||
RateLimitMiddleware,
|
||||
limit=1000,
|
||||
window_size=60,
|
||||
key_extractor=get_client_id,
|
||||
)
|
||||
|
||||
Choosing an Algorithm
|
||||
---------------------
|
||||
|
||||
The middleware supports all five algorithms:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic.core.algorithms import Algorithm
|
||||
|
||||
# Token bucket for burst-friendly limiting
|
||||
app.add_middleware(
|
||||
RateLimitMiddleware,
|
||||
limit=1000,
|
||||
window_size=60,
|
||||
algorithm=Algorithm.TOKEN_BUCKET,
|
||||
)
|
||||
|
||||
# Sliding window for precise limiting
|
||||
app.add_middleware(
|
||||
RateLimitMiddleware,
|
||||
limit=1000,
|
||||
window_size=60,
|
||||
algorithm=Algorithm.SLIDING_WINDOW,
|
||||
)
|
||||
|
||||
Using a Custom Backend
|
||||
----------------------
|
||||
|
||||
By default, middleware uses the memory backend. For production, you'll want
|
||||
something persistent:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import SQLiteBackend
|
||||
from fastapi_traffic.middleware import RateLimitMiddleware
|
||||
|
||||
backend = SQLiteBackend("rate_limits.db")
|
||||
|
||||
app.add_middleware(
|
||||
RateLimitMiddleware,
|
||||
limit=1000,
|
||||
window_size=60,
|
||||
backend=backend,
|
||||
)
|
||||
|
||||
@app.on_event("shutdown")
|
||||
async def shutdown():
|
||||
await backend.close()
|
||||
|
||||
For Redis:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic.backends.redis import RedisBackend
|
||||
|
||||
# Create backend at startup
|
||||
redis_backend = None
|
||||
|
||||
@app.on_event("startup")
|
||||
async def startup():
|
||||
global redis_backend
|
||||
redis_backend = await RedisBackend.from_url("redis://localhost:6379/0")
|
||||
|
||||
# Note: You'll need to configure middleware after startup
|
||||
# or use a factory pattern
|
||||
|
||||
Convenience Middleware Classes
|
||||
------------------------------
|
||||
|
||||
For common use cases, we provide pre-configured middleware:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic.middleware import (
|
||||
SlidingWindowMiddleware,
|
||||
TokenBucketMiddleware,
|
||||
)
|
||||
|
||||
# Sliding window algorithm
|
||||
app.add_middleware(
|
||||
SlidingWindowMiddleware,
|
||||
limit=1000,
|
||||
window_size=60,
|
||||
)
|
||||
|
||||
# Token bucket algorithm
|
||||
app.add_middleware(
|
||||
TokenBucketMiddleware,
|
||||
limit=1000,
|
||||
window_size=60,
|
||||
)
|
||||
|
||||
Combining with Decorator
|
||||
------------------------
|
||||
|
||||
You can use both middleware and decorators. The middleware provides a baseline
|
||||
limit, and decorators can add stricter limits to specific endpoints:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import rate_limit
|
||||
from fastapi_traffic.middleware import RateLimitMiddleware
|
||||
|
||||
# Global limit: 1000 req/min
|
||||
app.add_middleware(
|
||||
RateLimitMiddleware,
|
||||
limit=1000,
|
||||
window_size=60,
|
||||
)
|
||||
|
||||
# This endpoint has an additional, stricter limit
|
||||
@app.post("/api/expensive-operation")
|
||||
@rate_limit(10, 60) # Only 10 req/min for this endpoint
|
||||
async def expensive_operation(request: Request):
|
||||
return {"result": "done"}
|
||||
|
||||
# This endpoint uses only the global limit
|
||||
@app.get("/api/cheap-operation")
|
||||
async def cheap_operation():
|
||||
return {"result": "done"}
|
||||
|
||||
Both limits are checked. A request must pass both the middleware limit AND the
|
||||
decorator limit.
|
||||
|
||||
Error Responses
|
||||
---------------
|
||||
|
||||
When a client exceeds the rate limit, they get a 429 response:
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
{
|
||||
"detail": "Rate limit exceeded. Please try again later.",
|
||||
"retry_after": 45.2
|
||||
}
|
||||
|
||||
You can customize the message:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
app.add_middleware(
|
||||
RateLimitMiddleware,
|
||||
limit=1000,
|
||||
window_size=60,
|
||||
error_message="Whoa there! You're making requests too fast.",
|
||||
status_code=429,
|
||||
)
|
||||
|
||||
Response Headers
|
||||
----------------
|
||||
|
||||
By default, rate limit headers are included in every response:
|
||||
|
||||
.. code-block:: http
|
||||
|
||||
X-RateLimit-Limit: 1000
|
||||
X-RateLimit-Remaining: 847
|
||||
X-RateLimit-Reset: 1709834400
|
||||
|
||||
When rate limited:
|
||||
|
||||
.. code-block:: http
|
||||
|
||||
Retry-After: 45
|
||||
|
||||
Disable headers if you don't want to expose this information:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
app.add_middleware(
|
||||
RateLimitMiddleware,
|
||||
limit=1000,
|
||||
window_size=60,
|
||||
include_headers=False,
|
||||
)
|
||||
|
||||
Handling Backend Errors
|
||||
-----------------------
|
||||
|
||||
What happens if your Redis server goes down? By default, the middleware will
|
||||
raise an exception. You can change this behavior:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
app.add_middleware(
|
||||
RateLimitMiddleware,
|
||||
limit=1000,
|
||||
window_size=60,
|
||||
skip_on_error=True, # Allow requests through if backend fails
|
||||
)
|
||||
|
||||
With ``skip_on_error=True``, requests are allowed through when the backend is
|
||||
unavailable. This is a tradeoff between availability and protection.
|
||||
|
||||
Full Configuration Reference
|
||||
----------------------------
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
app.add_middleware(
|
||||
RateLimitMiddleware,
|
||||
limit=1000, # Max requests per window
|
||||
window_size=60.0, # Window size in seconds
|
||||
algorithm=Algorithm.SLIDING_WINDOW_COUNTER, # Algorithm to use
|
||||
backend=None, # Storage backend (default: MemoryBackend)
|
||||
key_prefix="middleware", # Prefix for rate limit keys
|
||||
include_headers=True, # Add rate limit headers to responses
|
||||
error_message="Rate limit exceeded. Please try again later.",
|
||||
status_code=429, # HTTP status when limited
|
||||
skip_on_error=False, # Allow requests if backend fails
|
||||
exempt_paths=None, # Set of paths to exempt
|
||||
exempt_ips=None, # Set of IPs to exempt
|
||||
key_extractor=default_key_extractor, # Function to identify clients
|
||||
)
|
||||
Reference in New Issue
Block a user