release: bump version to 0.3.0

- Refactor Redis backend connection handling and pool management
- Update algorithm implementations with improved type annotations
- Enhance config loader validation with stricter Pydantic schemas
- Improve decorator and middleware error handling
- Expand example scripts with better docstrings and usage patterns
- Add new 00_basic_usage.py example for quick start
- Reorganize examples directory structure
- Fix type annotation inconsistencies across core modules
- Update dependencies in pyproject.toml
This commit is contained in:
2026-03-17 20:55:38 +00:00
parent 492410614f
commit f3453cb0fc
51 changed files with 6507 additions and 166 deletions

View File

@@ -0,0 +1,319 @@
Distributed Systems
===================
Running rate limiting across multiple application instances requires careful
consideration. This guide covers the patterns and pitfalls.
The Challenge
-------------
In a distributed system, you might have:
- Multiple application instances behind a load balancer
- Kubernetes pods that scale up and down
- Serverless functions that run independently
Each instance needs to share rate limit state. Otherwise, a client could make
100 requests to instance A and another 100 to instance B, effectively bypassing
a 100 request limit.
Redis: The Standard Solution
----------------------------
Redis is the go-to choice for distributed rate limiting:
.. code-block:: python
from fastapi import FastAPI
from fastapi_traffic import RateLimiter
from fastapi_traffic.backends.redis import RedisBackend
from fastapi_traffic.core.limiter import set_limiter
app = FastAPI()
@app.on_event("startup")
async def startup():
backend = await RedisBackend.from_url(
"redis://redis-server:6379/0",
key_prefix="myapp:ratelimit",
)
limiter = RateLimiter(backend)
set_limiter(limiter)
await limiter.initialize()
@app.on_event("shutdown")
async def shutdown():
limiter = get_limiter()
await limiter.close()
All instances connect to the same Redis server and share state.
High Availability Redis
-----------------------
For production, you'll want Redis with high availability:
**Redis Sentinel:**
.. code-block:: python
backend = await RedisBackend.from_url(
"redis://sentinel1:26379,sentinel2:26379,sentinel3:26379/0",
sentinel_master="mymaster",
)
**Redis Cluster:**
.. code-block:: python
backend = await RedisBackend.from_url(
"redis://node1:6379,node2:6379,node3:6379/0",
)
Atomic Operations
-----------------
Race conditions are a real concern in distributed systems. Consider this scenario:
1. Instance A reads: 99 requests made
2. Instance B reads: 99 requests made
3. Instance A writes: 100 requests (allows request)
4. Instance B writes: 100 requests (allows request)
Now you've allowed 101 requests when the limit was 100.
FastAPI Traffic's Redis backend uses Lua scripts to make operations atomic:
.. code-block:: lua
-- Simplified example of atomic check-and-increment
local current = redis.call('GET', KEYS[1])
if current and tonumber(current) >= limit then
return 0 -- Reject
end
redis.call('INCR', KEYS[1])
return 1 -- Allow
The entire check-and-update happens in a single Redis operation.
Network Latency
---------------
Redis adds network latency to every request. Some strategies to minimize impact:
**1. Connection pooling (automatic):**
The Redis backend maintains a connection pool, so you're not creating new
connections for each request.
**2. Local caching:**
For very high-traffic endpoints, consider a two-tier approach:
.. code-block:: python
from fastapi_traffic import MemoryBackend, RateLimiter
# Local memory backend for fast path
local_backend = MemoryBackend()
local_limiter = RateLimiter(local_backend)
# Redis backend for distributed state
redis_backend = await RedisBackend.from_url("redis://localhost:6379/0")
distributed_limiter = RateLimiter(redis_backend)
async def check_rate_limit(request: Request, config: RateLimitConfig):
# Quick local check (may allow some extra requests)
local_result = await local_limiter.check(request, config)
if not local_result.allowed:
return local_result
# Authoritative distributed check
return await distributed_limiter.check(request, config)
**3. Skip on error:**
If Redis latency is causing issues, you might prefer to allow requests through
rather than block:
.. code-block:: python
@rate_limit(100, 60, skip_on_error=True)
async def endpoint(request: Request):
return {"status": "ok"}
Handling Redis Failures
-----------------------
What happens when Redis goes down?
**Fail closed (default):**
Requests fail. This is safer but impacts availability.
**Fail open:**
Allow requests through:
.. code-block:: python
@rate_limit(100, 60, skip_on_error=True)
**Circuit breaker pattern:**
Implement a circuit breaker to avoid hammering a failing Redis:
.. code-block:: python
import time
class CircuitBreaker:
def __init__(self, failure_threshold=5, reset_timeout=60):
self.failures = 0
self.threshold = failure_threshold
self.reset_timeout = reset_timeout
self.last_failure = 0
self.open = False
def record_failure(self):
self.failures += 1
self.last_failure = time.time()
if self.failures >= self.threshold:
self.open = True
def record_success(self):
self.failures = 0
self.open = False
def should_allow(self) -> bool:
if not self.open:
return True
# Check if we should try again
if time.time() - self.last_failure > self.reset_timeout:
return True
return False
Kubernetes Deployment
---------------------
Here's a typical Kubernetes setup:
.. code-block:: yaml
# redis-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
spec:
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
name: redis
spec:
selector:
app: redis
ports:
- port: 6379
.. code-block:: yaml
# app-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
spec:
containers:
- name: api
image: myapp:latest
env:
- name: REDIS_URL
value: "redis://redis:6379/0"
Your app connects to Redis via the service name:
.. code-block:: python
import os
redis_url = os.getenv("REDIS_URL", "redis://localhost:6379/0")
backend = await RedisBackend.from_url(redis_url)
Monitoring
----------
Keep an eye on:
1. **Redis latency:** High latency means slow requests
2. **Redis memory:** Rate limit data shouldn't use much, but monitor it
3. **Connection count:** Make sure you're not exhausting connections
4. **Rate limit hits:** Track how often clients are being limited
.. code-block:: python
import logging
logger = logging.getLogger(__name__)
def on_rate_limited(request: Request, result):
logger.info(
"Rate limited: client=%s path=%s remaining=%d",
request.client.host,
request.url.path,
result.info.remaining,
)
@rate_limit(100, 60, on_blocked=on_rate_limited)
async def endpoint(request: Request):
return {"status": "ok"}
Testing Distributed Rate Limits
-------------------------------
Testing distributed behavior is tricky. Here's an approach:
.. code-block:: python
import asyncio
import httpx
async def test_distributed_limit():
"""Simulate requests from multiple 'instances'."""
async with httpx.AsyncClient() as client:
# Fire 150 requests concurrently
tasks = [
client.get("http://localhost:8000/api/data")
for _ in range(150)
]
responses = await asyncio.gather(*tasks)
# Count successes and rate limits
successes = sum(1 for r in responses if r.status_code == 200)
limited = sum(1 for r in responses if r.status_code == 429)
print(f"Successes: {successes}, Rate limited: {limited}")
# With a limit of 100, expect ~100 successes and ~50 limited
asyncio.run(test_distributed_limit())

View File

@@ -0,0 +1,291 @@
Performance
===========
FastAPI Traffic is designed to be fast. But when you're handling thousands of
requests per second, every microsecond counts. Here's how to squeeze out the
best performance.
Baseline Performance
--------------------
On typical hardware, you can expect:
- **Memory backend:** ~0.01ms per check
- **SQLite backend:** ~0.1ms per check
- **Redis backend:** ~1ms per check (network dependent)
For most applications, this overhead is negligible compared to your actual
business logic.
Choosing the Right Algorithm
----------------------------
Algorithms have different performance characteristics:
.. list-table::
:header-rows: 1
* - Algorithm
- Time Complexity
- Space Complexity
- Notes
* - Token Bucket
- O(1)
- O(1)
- Two floats per key
* - Fixed Window
- O(1)
- O(1)
- One int + one float per key
* - Sliding Window Counter
- O(1)
- O(1)
- Three values per key
* - Leaky Bucket
- O(1)
- O(1)
- Two floats per key
* - Sliding Window
- O(n)
- O(n)
- Stores every timestamp
**Recommendation:** Use Sliding Window Counter (the default) unless you have
specific requirements. It's O(1) and provides good accuracy.
**Avoid Sliding Window for high-volume endpoints.** If you're allowing 10,000
requests per minute, that's 10,000 timestamps to store and filter per key.
Memory Backend Optimization
---------------------------
The memory backend is already fast, but you can tune it:
.. code-block:: python
from fastapi_traffic import MemoryBackend
backend = MemoryBackend(
max_size=10000, # Limit memory usage
cleanup_interval=60, # Less frequent cleanup = less overhead
)
**max_size:** Limits the number of keys stored. When exceeded, LRU eviction kicks
in. Set this based on your expected number of unique clients.
**cleanup_interval:** How often to scan for expired entries. Higher values mean
less CPU overhead but more memory usage from expired entries.
SQLite Backend Optimization
---------------------------
SQLite is surprisingly fast for rate limiting:
.. code-block:: python
from fastapi_traffic import SQLiteBackend
backend = SQLiteBackend(
"rate_limits.db",
cleanup_interval=300, # Clean every 5 minutes
)
**Tips:**
1. **Use an SSD.** SQLite performance depends heavily on disk I/O.
2. **Put the database on a local disk.** Network-attached storage adds latency.
3. **WAL mode is enabled by default.** This allows concurrent reads and writes.
4. **Increase cleanup_interval** if you have many keys. Cleanup scans the entire
table.
Redis Backend Optimization
--------------------------
Redis is the bottleneck in most distributed setups:
**1. Use connection pooling (automatic):**
The backend maintains a pool of connections. You don't need to do anything.
**2. Use pipelining for batch operations:**
If you're checking multiple rate limits, batch them:
.. code-block:: python
# Instead of multiple round trips
result1 = await limiter.check(request, config1)
result2 = await limiter.check(request, config2)
# Consider combining into one check with higher cost
combined_config = RateLimitConfig(limit=100, window_size=60, cost=2)
result = await limiter.check(request, combined_config)
**3. Use Redis close to your application:**
Network latency is usually the biggest factor. Run Redis in the same datacenter,
or better yet, the same availability zone.
**4. Consider Redis Cluster for high throughput:**
Distributes load across multiple Redis nodes.
Reducing Overhead
-----------------
**1. Exempt paths that don't need limiting:**
.. code-block:: python
app.add_middleware(
RateLimitMiddleware,
limit=1000,
window_size=60,
exempt_paths={"/health", "/metrics", "/ready"},
)
**2. Use coarse-grained limits when possible:**
Instead of limiting every endpoint separately, use middleware for a global limit:
.. code-block:: python
# One check per request
app.add_middleware(RateLimitMiddleware, limit=1000, window_size=60)
# vs. multiple checks per request
@rate_limit(100, 60) # Check 1
@another_decorator # Check 2
async def endpoint():
pass
**3. Increase window size:**
Longer windows mean fewer state updates:
.. code-block:: python
# Updates state 60 times per minute per client
@rate_limit(60, 60)
# Updates state 1 time per minute per client
@rate_limit(1, 1) # Same rate, but per-second
Wait, that's backwards. Actually, the number of state updates equals the number
of requests, regardless of window size. But longer windows mean:
- Fewer unique window boundaries
- Better cache efficiency
- More stable rate limiting
**4. Skip headers when not needed:**
.. code-block:: python
@rate_limit(100, 60, include_headers=False)
Saves a tiny bit of response processing.
Benchmarking
------------
Here's a simple benchmark script:
.. code-block:: python
import asyncio
import time
from fastapi_traffic import MemoryBackend, RateLimiter, RateLimitConfig
from unittest.mock import MagicMock
async def benchmark():
backend = MemoryBackend()
limiter = RateLimiter(backend)
await limiter.initialize()
config = RateLimitConfig(limit=10000, window_size=60)
# Mock request
request = MagicMock()
request.client.host = "127.0.0.1"
request.url.path = "/test"
request.method = "GET"
request.headers = {}
# Warm up
for _ in range(100):
await limiter.check(request, config)
# Benchmark
iterations = 10000
start = time.perf_counter()
for _ in range(iterations):
await limiter.check(request, config)
elapsed = time.perf_counter() - start
print(f"Total time: {elapsed:.3f}s")
print(f"Per check: {elapsed/iterations*1000:.3f}ms")
print(f"Checks/sec: {iterations/elapsed:.0f}")
await limiter.close()
asyncio.run(benchmark())
Typical output:
.. code-block:: text
Total time: 0.150s
Per check: 0.015ms
Checks/sec: 66666
Profiling
---------
If you suspect rate limiting is a bottleneck, profile it:
.. code-block:: python
import cProfile
import pstats
async def profile_rate_limiting():
# Your rate limiting code here
pass
cProfile.run('asyncio.run(profile_rate_limiting())', 'rate_limit.prof')
stats = pstats.Stats('rate_limit.prof')
stats.sort_stats('cumulative')
stats.print_stats(20)
Look for:
- Time spent in backend operations
- Time spent in algorithm calculations
- Unexpected hotspots
When Performance Really Matters
-------------------------------
If you're handling millions of requests per second and rate limiting overhead
is significant:
1. **Consider sampling:** Only check rate limits for a percentage of requests
and extrapolate.
2. **Use probabilistic data structures:** Bloom filters or Count-Min Sketch can
approximate rate limiting with less overhead.
3. **Push to the edge:** Use CDN-level rate limiting (Cloudflare, AWS WAF) to
handle the bulk of traffic.
4. **Accept some inaccuracy:** Fixed window with ``skip_on_error=True`` is very
fast and "good enough" for many use cases.
For most applications, though, the default configuration is plenty fast.

367
docs/advanced/testing.rst Normal file
View File

@@ -0,0 +1,367 @@
Testing
=======
Testing rate-limited endpoints requires some care. You don't want your tests to
be flaky because of timing issues, and you need to verify that limits actually work.
Basic Testing Setup
-------------------
Use pytest with pytest-asyncio for async tests:
.. code-block:: python
# conftest.py
import pytest
from fastapi.testclient import TestClient
from fastapi_traffic import MemoryBackend, RateLimiter
from fastapi_traffic.core.limiter import set_limiter
@pytest.fixture
def app():
"""Create a fresh app for each test."""
from myapp import create_app
return create_app()
@pytest.fixture
def client(app):
"""Test client with fresh rate limiter."""
backend = MemoryBackend()
limiter = RateLimiter(backend)
set_limiter(limiter)
with TestClient(app) as client:
yield client
Testing Rate Limit Enforcement
------------------------------
Verify that the limit is actually enforced:
.. code-block:: python
def test_rate_limit_enforced(client):
"""Test that requests are blocked after limit is reached."""
# Make requests up to the limit
for i in range(10):
response = client.get("/api/data")
assert response.status_code == 200, f"Request {i+1} should succeed"
# Next request should be rate limited
response = client.get("/api/data")
assert response.status_code == 429
assert "retry_after" in response.json()
Testing Rate Limit Headers
--------------------------
Check that headers are included correctly:
.. code-block:: python
def test_rate_limit_headers(client):
"""Test that rate limit headers are present."""
response = client.get("/api/data")
assert "X-RateLimit-Limit" in response.headers
assert "X-RateLimit-Remaining" in response.headers
assert "X-RateLimit-Reset" in response.headers
# Verify values make sense
limit = int(response.headers["X-RateLimit-Limit"])
remaining = int(response.headers["X-RateLimit-Remaining"])
assert limit == 100 # Your configured limit
assert remaining == 99 # One request made
Testing Different Clients
-------------------------
Verify that different clients have separate limits:
.. code-block:: python
def test_separate_limits_per_client(client):
"""Test that different IPs have separate limits."""
# Client A makes requests
for _ in range(10):
response = client.get(
"/api/data",
headers={"X-Forwarded-For": "1.1.1.1"}
)
assert response.status_code == 200
# Client A is now limited
response = client.get(
"/api/data",
headers={"X-Forwarded-For": "1.1.1.1"}
)
assert response.status_code == 429
# Client B should still have full quota
response = client.get(
"/api/data",
headers={"X-Forwarded-For": "2.2.2.2"}
)
assert response.status_code == 200
Testing Window Reset
--------------------
Test that limits reset after the window expires:
.. code-block:: python
import time
from unittest.mock import patch
def test_limit_resets_after_window(client):
"""Test that limits reset after window expires."""
# Exhaust the limit
for _ in range(10):
client.get("/api/data")
# Should be limited
response = client.get("/api/data")
assert response.status_code == 429
# Fast-forward time (mock time.time)
with patch('time.time') as mock_time:
# Move 61 seconds into the future
mock_time.return_value = time.time() + 61
# Should be allowed again
response = client.get("/api/data")
assert response.status_code == 200
Testing Exemptions
------------------
Verify that exemptions work:
.. code-block:: python
def test_exempt_paths(client):
"""Test that exempt paths bypass rate limiting."""
# Exhaust limit on a regular endpoint
for _ in range(100):
client.get("/api/data")
# Regular endpoint should be limited
response = client.get("/api/data")
assert response.status_code == 429
# Health check should still work
response = client.get("/health")
assert response.status_code == 200
def test_exempt_ips(client):
"""Test that exempt IPs bypass rate limiting."""
# Make many requests from exempt IP
for _ in range(1000):
response = client.get(
"/api/data",
headers={"X-Forwarded-For": "127.0.0.1"}
)
assert response.status_code == 200 # Never limited
Testing with Async Client
-------------------------
For async endpoints, use httpx:
.. code-block:: python
import pytest
import httpx
@pytest.mark.asyncio
async def test_async_rate_limiting():
"""Test rate limiting with async client."""
async with httpx.AsyncClient(app=app, base_url="http://test") as client:
# Make concurrent requests
responses = await asyncio.gather(*[
client.get("/api/data")
for _ in range(15)
])
successes = sum(1 for r in responses if r.status_code == 200)
limited = sum(1 for r in responses if r.status_code == 429)
assert successes == 10 # Limit
assert limited == 5 # Over limit
Testing Backend Failures
------------------------
Test behavior when the backend fails:
.. code-block:: python
from unittest.mock import AsyncMock, patch
from fastapi_traffic import BackendError
def test_skip_on_error(client):
"""Test that requests are allowed when backend fails and skip_on_error=True."""
with patch.object(
MemoryBackend, 'get',
side_effect=BackendError("Connection failed")
):
# With skip_on_error=True, should still work
response = client.get("/api/data")
assert response.status_code == 200
def test_fail_on_error(client):
"""Test that requests fail when backend fails and skip_on_error=False."""
with patch.object(
MemoryBackend, 'get',
side_effect=BackendError("Connection failed")
):
# With skip_on_error=False (default), should fail
response = client.get("/api/strict-data")
assert response.status_code == 500
Mocking the Rate Limiter
------------------------
For unit tests, you might want to mock the rate limiter entirely:
.. code-block:: python
from unittest.mock import AsyncMock, MagicMock
from fastapi_traffic.core.limiter import set_limiter
from fastapi_traffic.core.models import RateLimitInfo, RateLimitResult
def test_with_mocked_limiter(client):
"""Test endpoint logic without actual rate limiting."""
mock_limiter = MagicMock()
mock_limiter.hit = AsyncMock(return_value=RateLimitResult(
allowed=True,
info=RateLimitInfo(
limit=100,
remaining=99,
reset_at=time.time() + 60,
window_size=60,
),
key="test",
))
set_limiter(mock_limiter)
response = client.get("/api/data")
assert response.status_code == 200
mock_limiter.hit.assert_called_once()
Integration Testing with Redis
------------------------------
For integration tests with Redis:
.. code-block:: python
import pytest
from fastapi_traffic.backends.redis import RedisBackend
@pytest.fixture
async def redis_backend():
"""Create a Redis backend for testing."""
backend = await RedisBackend.from_url(
"redis://localhost:6379/15", # Use a test database
key_prefix="test:",
)
yield backend
await backend.clear() # Clean up after test
await backend.close()
@pytest.mark.asyncio
async def test_redis_rate_limiting(redis_backend):
"""Test rate limiting with real Redis."""
limiter = RateLimiter(redis_backend)
await limiter.initialize()
config = RateLimitConfig(limit=5, window_size=60)
request = create_mock_request("1.1.1.1")
# Make requests up to limit
for _ in range(5):
result = await limiter.check(request, config)
assert result.allowed
# Next should be blocked
result = await limiter.check(request, config)
assert not result.allowed
await limiter.close()
Fixtures for Common Scenarios
-----------------------------
.. code-block:: python
# conftest.py
import pytest
from fastapi_traffic import MemoryBackend, RateLimiter, RateLimitConfig
from fastapi_traffic.core.limiter import set_limiter
@pytest.fixture
def fresh_limiter():
"""Fresh rate limiter for each test."""
backend = MemoryBackend()
limiter = RateLimiter(backend)
set_limiter(limiter)
return limiter
@pytest.fixture
def rate_limit_config():
"""Standard rate limit config for tests."""
return RateLimitConfig(
limit=10,
window_size=60,
)
@pytest.fixture
def mock_request():
"""Create a mock request."""
def _create(ip="127.0.0.1", path="/test"):
request = MagicMock()
request.client.host = ip
request.url.path = path
request.method = "GET"
request.headers = {}
return request
return _create
Avoiding Flaky Tests
--------------------
Rate limiting tests can be flaky due to timing. Tips:
1. **Use short windows for tests:**
.. code-block:: python
@rate_limit(10, 1) # 10 per second, not 10 per minute
2. **Mock time instead of sleeping:**
.. code-block:: python
with patch('time.time', return_value=future_time):
# Test window reset
3. **Reset state between tests:**
.. code-block:: python
@pytest.fixture(autouse=True)
async def reset_limiter():
yield
limiter = get_limiter()
await limiter.backend.clear()
4. **Use unique keys per test:**
.. code-block:: python
def test_something(mock_request):
request = mock_request(ip=f"test-{uuid.uuid4()}")