Distributed Systems =================== Running rate limiting across multiple application instances requires careful consideration. This guide covers the patterns and pitfalls. The Challenge ------------- In a distributed system, you might have: - Multiple application instances behind a load balancer - Kubernetes pods that scale up and down - Serverless functions that run independently Each instance needs to share rate limit state. Otherwise, a client could make 100 requests to instance A and another 100 to instance B, effectively bypassing a 100 request limit. Redis: The Standard Solution ---------------------------- Redis is the go-to choice for distributed rate limiting: .. code-block:: python from fastapi import FastAPI from fastapi_traffic import RateLimiter from fastapi_traffic.backends.redis import RedisBackend from fastapi_traffic.core.limiter import set_limiter app = FastAPI() @app.on_event("startup") async def startup(): backend = await RedisBackend.from_url( "redis://redis-server:6379/0", key_prefix="myapp:ratelimit", ) limiter = RateLimiter(backend) set_limiter(limiter) await limiter.initialize() @app.on_event("shutdown") async def shutdown(): limiter = get_limiter() await limiter.close() All instances connect to the same Redis server and share state. High Availability Redis ----------------------- For production, you'll want Redis with high availability: **Redis Sentinel:** .. code-block:: python backend = await RedisBackend.from_url( "redis://sentinel1:26379,sentinel2:26379,sentinel3:26379/0", sentinel_master="mymaster", ) **Redis Cluster:** .. code-block:: python backend = await RedisBackend.from_url( "redis://node1:6379,node2:6379,node3:6379/0", ) Atomic Operations ----------------- Race conditions are a real concern in distributed systems. Consider this scenario: 1. Instance A reads: 99 requests made 2. Instance B reads: 99 requests made 3. Instance A writes: 100 requests (allows request) 4. Instance B writes: 100 requests (allows request) Now you've allowed 101 requests when the limit was 100. FastAPI Traffic's Redis backend uses Lua scripts to make operations atomic: .. code-block:: lua -- Simplified example of atomic check-and-increment local current = redis.call('GET', KEYS[1]) if current and tonumber(current) >= limit then return 0 -- Reject end redis.call('INCR', KEYS[1]) return 1 -- Allow The entire check-and-update happens in a single Redis operation. Network Latency --------------- Redis adds network latency to every request. Some strategies to minimize impact: **1. Connection pooling (automatic):** The Redis backend maintains a connection pool, so you're not creating new connections for each request. **2. Local caching:** For very high-traffic endpoints, consider a two-tier approach: .. code-block:: python from fastapi_traffic import MemoryBackend, RateLimiter # Local memory backend for fast path local_backend = MemoryBackend() local_limiter = RateLimiter(local_backend) # Redis backend for distributed state redis_backend = await RedisBackend.from_url("redis://localhost:6379/0") distributed_limiter = RateLimiter(redis_backend) async def check_rate_limit(request: Request, config: RateLimitConfig): # Quick local check (may allow some extra requests) local_result = await local_limiter.check(request, config) if not local_result.allowed: return local_result # Authoritative distributed check return await distributed_limiter.check(request, config) **3. Skip on error:** If Redis latency is causing issues, you might prefer to allow requests through rather than block: .. code-block:: python @rate_limit(100, 60, skip_on_error=True) async def endpoint(request: Request): return {"status": "ok"} Handling Redis Failures ----------------------- What happens when Redis goes down? **Fail closed (default):** Requests fail. This is safer but impacts availability. **Fail open:** Allow requests through: .. code-block:: python @rate_limit(100, 60, skip_on_error=True) **Circuit breaker pattern:** Implement a circuit breaker to avoid hammering a failing Redis: .. code-block:: python import time class CircuitBreaker: def __init__(self, failure_threshold=5, reset_timeout=60): self.failures = 0 self.threshold = failure_threshold self.reset_timeout = reset_timeout self.last_failure = 0 self.open = False def record_failure(self): self.failures += 1 self.last_failure = time.time() if self.failures >= self.threshold: self.open = True def record_success(self): self.failures = 0 self.open = False def should_allow(self) -> bool: if not self.open: return True # Check if we should try again if time.time() - self.last_failure > self.reset_timeout: return True return False Kubernetes Deployment --------------------- Here's a typical Kubernetes setup: .. code-block:: yaml # redis-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: redis spec: replicas: 1 selector: matchLabels: app: redis template: metadata: labels: app: redis spec: containers: - name: redis image: redis:7-alpine ports: - containerPort: 6379 --- apiVersion: v1 kind: Service metadata: name: redis spec: selector: app: redis ports: - port: 6379 .. code-block:: yaml # app-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: api spec: replicas: 3 selector: matchLabels: app: api template: spec: containers: - name: api image: myapp:latest env: - name: REDIS_URL value: "redis://redis:6379/0" Your app connects to Redis via the service name: .. code-block:: python import os redis_url = os.getenv("REDIS_URL", "redis://localhost:6379/0") backend = await RedisBackend.from_url(redis_url) Monitoring ---------- Keep an eye on: 1. **Redis latency:** High latency means slow requests 2. **Redis memory:** Rate limit data shouldn't use much, but monitor it 3. **Connection count:** Make sure you're not exhausting connections 4. **Rate limit hits:** Track how often clients are being limited .. code-block:: python import logging logger = logging.getLogger(__name__) def on_rate_limited(request: Request, result): logger.info( "Rate limited: client=%s path=%s remaining=%d", request.client.host, request.url.path, result.info.remaining, ) @rate_limit(100, 60, on_blocked=on_rate_limited) async def endpoint(request: Request): return {"status": "ok"} Testing Distributed Rate Limits ------------------------------- Testing distributed behavior is tricky. Here's an approach: .. code-block:: python import asyncio import httpx async def test_distributed_limit(): """Simulate requests from multiple 'instances'.""" async with httpx.AsyncClient() as client: # Fire 150 requests concurrently tasks = [ client.get("http://localhost:8000/api/data") for _ in range(150) ] responses = await asyncio.gather(*tasks) # Count successes and rate limits successes = sum(1 for r in responses if r.status_code == 200) limited = sum(1 for r in responses if r.status_code == 429) print(f"Successes: {successes}, Rate limited: {limited}") # With a limit of 100, expect ~100 successes and ~50 limited asyncio.run(test_distributed_limit())