- Refactor Redis backend connection handling and pool management - Update algorithm implementations with improved type annotations - Enhance config loader validation with stricter Pydantic schemas - Improve decorator and middleware error handling - Expand example scripts with better docstrings and usage patterns - Add new 00_basic_usage.py example for quick start - Reorganize examples directory structure - Fix type annotation inconsistencies across core modules - Update dependencies in pyproject.toml
320 lines
7.9 KiB
ReStructuredText
320 lines
7.9 KiB
ReStructuredText
Distributed Systems
|
|
===================
|
|
|
|
Running rate limiting across multiple application instances requires careful
|
|
consideration. This guide covers the patterns and pitfalls.
|
|
|
|
The Challenge
|
|
-------------
|
|
|
|
In a distributed system, you might have:
|
|
|
|
- Multiple application instances behind a load balancer
|
|
- Kubernetes pods that scale up and down
|
|
- Serverless functions that run independently
|
|
|
|
Each instance needs to share rate limit state. Otherwise, a client could make
|
|
100 requests to instance A and another 100 to instance B, effectively bypassing
|
|
a 100 request limit.
|
|
|
|
Redis: The Standard Solution
|
|
----------------------------
|
|
|
|
Redis is the go-to choice for distributed rate limiting:
|
|
|
|
.. code-block:: python
|
|
|
|
from fastapi import FastAPI
|
|
from fastapi_traffic import RateLimiter
|
|
from fastapi_traffic.backends.redis import RedisBackend
|
|
from fastapi_traffic.core.limiter import set_limiter
|
|
|
|
app = FastAPI()
|
|
|
|
@app.on_event("startup")
|
|
async def startup():
|
|
backend = await RedisBackend.from_url(
|
|
"redis://redis-server:6379/0",
|
|
key_prefix="myapp:ratelimit",
|
|
)
|
|
limiter = RateLimiter(backend)
|
|
set_limiter(limiter)
|
|
await limiter.initialize()
|
|
|
|
@app.on_event("shutdown")
|
|
async def shutdown():
|
|
limiter = get_limiter()
|
|
await limiter.close()
|
|
|
|
All instances connect to the same Redis server and share state.
|
|
|
|
High Availability Redis
|
|
-----------------------
|
|
|
|
For production, you'll want Redis with high availability:
|
|
|
|
**Redis Sentinel:**
|
|
|
|
.. code-block:: python
|
|
|
|
backend = await RedisBackend.from_url(
|
|
"redis://sentinel1:26379,sentinel2:26379,sentinel3:26379/0",
|
|
sentinel_master="mymaster",
|
|
)
|
|
|
|
**Redis Cluster:**
|
|
|
|
.. code-block:: python
|
|
|
|
backend = await RedisBackend.from_url(
|
|
"redis://node1:6379,node2:6379,node3:6379/0",
|
|
)
|
|
|
|
Atomic Operations
|
|
-----------------
|
|
|
|
Race conditions are a real concern in distributed systems. Consider this scenario:
|
|
|
|
1. Instance A reads: 99 requests made
|
|
2. Instance B reads: 99 requests made
|
|
3. Instance A writes: 100 requests (allows request)
|
|
4. Instance B writes: 100 requests (allows request)
|
|
|
|
Now you've allowed 101 requests when the limit was 100.
|
|
|
|
FastAPI Traffic's Redis backend uses Lua scripts to make operations atomic:
|
|
|
|
.. code-block:: lua
|
|
|
|
-- Simplified example of atomic check-and-increment
|
|
local current = redis.call('GET', KEYS[1])
|
|
if current and tonumber(current) >= limit then
|
|
return 0 -- Reject
|
|
end
|
|
redis.call('INCR', KEYS[1])
|
|
return 1 -- Allow
|
|
|
|
The entire check-and-update happens in a single Redis operation.
|
|
|
|
Network Latency
|
|
---------------
|
|
|
|
Redis adds network latency to every request. Some strategies to minimize impact:
|
|
|
|
**1. Connection pooling (automatic):**
|
|
|
|
The Redis backend maintains a connection pool, so you're not creating new
|
|
connections for each request.
|
|
|
|
**2. Local caching:**
|
|
|
|
For very high-traffic endpoints, consider a two-tier approach:
|
|
|
|
.. code-block:: python
|
|
|
|
from fastapi_traffic import MemoryBackend, RateLimiter
|
|
|
|
# Local memory backend for fast path
|
|
local_backend = MemoryBackend()
|
|
local_limiter = RateLimiter(local_backend)
|
|
|
|
# Redis backend for distributed state
|
|
redis_backend = await RedisBackend.from_url("redis://localhost:6379/0")
|
|
distributed_limiter = RateLimiter(redis_backend)
|
|
|
|
async def check_rate_limit(request: Request, config: RateLimitConfig):
|
|
# Quick local check (may allow some extra requests)
|
|
local_result = await local_limiter.check(request, config)
|
|
if not local_result.allowed:
|
|
return local_result
|
|
|
|
# Authoritative distributed check
|
|
return await distributed_limiter.check(request, config)
|
|
|
|
**3. Skip on error:**
|
|
|
|
If Redis latency is causing issues, you might prefer to allow requests through
|
|
rather than block:
|
|
|
|
.. code-block:: python
|
|
|
|
@rate_limit(100, 60, skip_on_error=True)
|
|
async def endpoint(request: Request):
|
|
return {"status": "ok"}
|
|
|
|
Handling Redis Failures
|
|
-----------------------
|
|
|
|
What happens when Redis goes down?
|
|
|
|
**Fail closed (default):**
|
|
|
|
Requests fail. This is safer but impacts availability.
|
|
|
|
**Fail open:**
|
|
|
|
Allow requests through:
|
|
|
|
.. code-block:: python
|
|
|
|
@rate_limit(100, 60, skip_on_error=True)
|
|
|
|
**Circuit breaker pattern:**
|
|
|
|
Implement a circuit breaker to avoid hammering a failing Redis:
|
|
|
|
.. code-block:: python
|
|
|
|
import time
|
|
|
|
class CircuitBreaker:
|
|
def __init__(self, failure_threshold=5, reset_timeout=60):
|
|
self.failures = 0
|
|
self.threshold = failure_threshold
|
|
self.reset_timeout = reset_timeout
|
|
self.last_failure = 0
|
|
self.open = False
|
|
|
|
def record_failure(self):
|
|
self.failures += 1
|
|
self.last_failure = time.time()
|
|
if self.failures >= self.threshold:
|
|
self.open = True
|
|
|
|
def record_success(self):
|
|
self.failures = 0
|
|
self.open = False
|
|
|
|
def should_allow(self) -> bool:
|
|
if not self.open:
|
|
return True
|
|
# Check if we should try again
|
|
if time.time() - self.last_failure > self.reset_timeout:
|
|
return True
|
|
return False
|
|
|
|
Kubernetes Deployment
|
|
---------------------
|
|
|
|
Here's a typical Kubernetes setup:
|
|
|
|
.. code-block:: yaml
|
|
|
|
# redis-deployment.yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: redis
|
|
spec:
|
|
replicas: 1
|
|
selector:
|
|
matchLabels:
|
|
app: redis
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: redis
|
|
spec:
|
|
containers:
|
|
- name: redis
|
|
image: redis:7-alpine
|
|
ports:
|
|
- containerPort: 6379
|
|
---
|
|
apiVersion: v1
|
|
kind: Service
|
|
metadata:
|
|
name: redis
|
|
spec:
|
|
selector:
|
|
app: redis
|
|
ports:
|
|
- port: 6379
|
|
|
|
.. code-block:: yaml
|
|
|
|
# app-deployment.yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: api
|
|
spec:
|
|
replicas: 3
|
|
selector:
|
|
matchLabels:
|
|
app: api
|
|
template:
|
|
spec:
|
|
containers:
|
|
- name: api
|
|
image: myapp:latest
|
|
env:
|
|
- name: REDIS_URL
|
|
value: "redis://redis:6379/0"
|
|
|
|
Your app connects to Redis via the service name:
|
|
|
|
.. code-block:: python
|
|
|
|
import os
|
|
|
|
redis_url = os.getenv("REDIS_URL", "redis://localhost:6379/0")
|
|
backend = await RedisBackend.from_url(redis_url)
|
|
|
|
Monitoring
|
|
----------
|
|
|
|
Keep an eye on:
|
|
|
|
1. **Redis latency:** High latency means slow requests
|
|
2. **Redis memory:** Rate limit data shouldn't use much, but monitor it
|
|
3. **Connection count:** Make sure you're not exhausting connections
|
|
4. **Rate limit hits:** Track how often clients are being limited
|
|
|
|
.. code-block:: python
|
|
|
|
import logging
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
def on_rate_limited(request: Request, result):
|
|
logger.info(
|
|
"Rate limited: client=%s path=%s remaining=%d",
|
|
request.client.host,
|
|
request.url.path,
|
|
result.info.remaining,
|
|
)
|
|
|
|
@rate_limit(100, 60, on_blocked=on_rate_limited)
|
|
async def endpoint(request: Request):
|
|
return {"status": "ok"}
|
|
|
|
Testing Distributed Rate Limits
|
|
-------------------------------
|
|
|
|
Testing distributed behavior is tricky. Here's an approach:
|
|
|
|
.. code-block:: python
|
|
|
|
import asyncio
|
|
import httpx
|
|
|
|
async def test_distributed_limit():
|
|
"""Simulate requests from multiple 'instances'."""
|
|
async with httpx.AsyncClient() as client:
|
|
# Fire 150 requests concurrently
|
|
tasks = [
|
|
client.get("http://localhost:8000/api/data")
|
|
for _ in range(150)
|
|
]
|
|
responses = await asyncio.gather(*tasks)
|
|
|
|
# Count successes and rate limits
|
|
successes = sum(1 for r in responses if r.status_code == 200)
|
|
limited = sum(1 for r in responses if r.status_code == 429)
|
|
|
|
print(f"Successes: {successes}, Rate limited: {limited}")
|
|
# With a limit of 100, expect ~100 successes and ~50 limited
|
|
|
|
asyncio.run(test_distributed_limit())
|