fastapi-traffic/docs/advanced/distributed-systems.rst

Distributed Systems
===================

Running rate limiting across multiple application instances requires careful
consideration. This guide covers the patterns and pitfalls.

The Challenge
-------------

In a distributed system, you might have:

- Multiple application instances behind a load balancer
- Kubernetes pods that scale up and down
- Serverless functions that run independently

Each instance needs to share rate limit state. Otherwise, a client could make
100 requests to instance A and another 100 to instance B, effectively bypassing
a 100 request limit.

Redis: The Standard Solution
----------------------------

Redis is the go-to choice for distributed rate limiting:

.. code-block:: python

   from fastapi import FastAPI
   from fastapi_traffic import RateLimiter
   from fastapi_traffic.backends.redis import RedisBackend
   from fastapi_traffic.core.limiter import set_limiter

   app = FastAPI()

   @app.on_event("startup")
   async def startup():
       backend = await RedisBackend.from_url(
           "redis://redis-server:6379/0",
           key_prefix="myapp:ratelimit",
       )
       limiter = RateLimiter(backend)
       set_limiter(limiter)
       await limiter.initialize()

   @app.on_event("shutdown")
   async def shutdown():
       limiter = get_limiter()
       await limiter.close()

All instances connect to the same Redis server and share state.

High Availability Redis
-----------------------

For production, you'll want Redis with high availability:

**Redis Sentinel:**

.. code-block:: python

   backend = await RedisBackend.from_url(
       "redis://sentinel1:26379,sentinel2:26379,sentinel3:26379/0",
       sentinel_master="mymaster",
   )

**Redis Cluster:**

.. code-block:: python

   backend = await RedisBackend.from_url(
       "redis://node1:6379,node2:6379,node3:6379/0",
   )

Atomic Operations
-----------------

Race conditions are a real concern in distributed systems. Consider this scenario:

1. Instance A reads: 99 requests made
2. Instance B reads: 99 requests made
3. Instance A writes: 100 requests (allows request)
4. Instance B writes: 100 requests (allows request)

Now you've allowed 101 requests when the limit was 100.

FastAPI Traffic's Redis backend uses Lua scripts to make operations atomic:

.. code-block:: lua

   -- Simplified example of atomic check-and-increment
   local current = redis.call('GET', KEYS[1])
   if current and tonumber(current) >= limit then
       return 0  -- Reject
   end
   redis.call('INCR', KEYS[1])
   return 1  -- Allow

The entire check-and-update happens in a single Redis operation.

Network Latency
---------------

Redis adds network latency to every request. Some strategies to minimize impact:

**1. Connection pooling (automatic):**

The Redis backend maintains a connection pool, so you're not creating new
connections for each request.

**2. Local caching:**

For very high-traffic endpoints, consider a two-tier approach:

.. code-block:: python

   from fastapi_traffic import MemoryBackend, RateLimiter

   # Local memory backend for fast path
   local_backend = MemoryBackend()
   local_limiter = RateLimiter(local_backend)

   # Redis backend for distributed state
   redis_backend = await RedisBackend.from_url("redis://localhost:6379/0")
   distributed_limiter = RateLimiter(redis_backend)

   async def check_rate_limit(request: Request, config: RateLimitConfig):
       # Quick local check (may allow some extra requests)
       local_result = await local_limiter.check(request, config)
       if not local_result.allowed:
           return local_result

       # Authoritative distributed check
       return await distributed_limiter.check(request, config)

**3. Skip on error:**

If Redis latency is causing issues, you might prefer to allow requests through
rather than block:

.. code-block:: python

   @rate_limit(100, 60, skip_on_error=True)
   async def endpoint(request: Request):
       return {"status": "ok"}

Handling Redis Failures
-----------------------

What happens when Redis goes down?

**Fail closed (default):**

Requests fail. This is safer but impacts availability.

**Fail open:**

Allow requests through:

.. code-block:: python

   @rate_limit(100, 60, skip_on_error=True)

**Circuit breaker pattern:**

Implement a circuit breaker to avoid hammering a failing Redis:

.. code-block:: python

   import time

   class CircuitBreaker:
       def __init__(self, failure_threshold=5, reset_timeout=60):
           self.failures = 0
           self.threshold = failure_threshold
           self.reset_timeout = reset_timeout
           self.last_failure = 0
           self.open = False

       def record_failure(self):
           self.failures += 1
           self.last_failure = time.time()
           if self.failures >= self.threshold:
               self.open = True

       def record_success(self):
           self.failures = 0
           self.open = False

       def should_allow(self) -> bool:
           if not self.open:
               return True
           # Check if we should try again
           if time.time() - self.last_failure > self.reset_timeout:
               return True
           return False

Kubernetes Deployment
---------------------

Here's a typical Kubernetes setup:

.. code-block:: yaml

   # redis-deployment.yaml
   apiVersion: apps/v1
   kind: Deployment
   metadata:
     name: redis
   spec:
     replicas: 1
     selector:
       matchLabels:
         app: redis
     template:
       metadata:
         labels:
           app: redis
       spec:
         containers:
         - name: redis
           image: redis:7-alpine
           ports:
           - containerPort: 6379
   ---
   apiVersion: v1
   kind: Service
   metadata:
     name: redis
   spec:
     selector:
       app: redis
     ports:
     - port: 6379

.. code-block:: yaml

   # app-deployment.yaml
   apiVersion: apps/v1
   kind: Deployment
   metadata:
     name: api
   spec:
     replicas: 3
     selector:
       matchLabels:
         app: api
     template:
       spec:
         containers:
         - name: api
           image: myapp:latest
           env:
           - name: REDIS_URL
             value: "redis://redis:6379/0"

Your app connects to Redis via the service name:

.. code-block:: python

   import os

   redis_url = os.getenv("REDIS_URL", "redis://localhost:6379/0")
   backend = await RedisBackend.from_url(redis_url)

Monitoring
----------

Keep an eye on:

1. **Redis latency:** High latency means slow requests
2. **Redis memory:** Rate limit data shouldn't use much, but monitor it
3. **Connection count:** Make sure you're not exhausting connections
4. **Rate limit hits:** Track how often clients are being limited

.. code-block:: python

   import logging

   logger = logging.getLogger(__name__)

   def on_rate_limited(request: Request, result):
       logger.info(
           "Rate limited: client=%s path=%s remaining=%d",
           request.client.host,
           request.url.path,
           result.info.remaining,
       )

   @rate_limit(100, 60, on_blocked=on_rate_limited)
   async def endpoint(request: Request):
       return {"status": "ok"}

Testing Distributed Rate Limits
-------------------------------

Testing distributed behavior is tricky. Here's an approach:

.. code-block:: python

   import asyncio
   import httpx

   async def test_distributed_limit():
       """Simulate requests from multiple 'instances'."""
       async with httpx.AsyncClient() as client:
           # Fire 150 requests concurrently
           tasks = [
               client.get("http://localhost:8000/api/data")
               for _ in range(150)
           ]
           responses = await asyncio.gather(*tasks)

           # Count successes and rate limits
           successes = sum(1 for r in responses if r.status_code == 200)
           limited = sum(1 for r in responses if r.status_code == 429)

           print(f"Successes: {successes}, Rate limited: {limited}")
           # With a limit of 100, expect ~100 successes and ~50 limited

   asyncio.run(test_distributed_limit())