release: bump version to 0.3.0

- Refactor Redis backend connection handling and pool management - Update algorithm implementations with improved type annotations - Enhance config loader validation with stricter Pydantic schemas - Improve decorator and middleware error handling - Expand example scripts with better docstrings and usage patterns - Add new 00_basic_usage.py example for quick start - Reorganize examples directory structure - Fix type annotation inconsistencies across core modules - Update dependencies in pyproject.toml
2026-03-17 20:55:38 +00:00
parent 492410614f
commit f3453cb0fc
51 changed files with 6507 additions and 166 deletions
--- a/docs/user-guide/algorithms.rst
+++ b/docs/user-guide/algorithms.rst
@@ -0,0 +1,290 @@
+Rate Limiting Algorithms
+========================
+
+FastAPI Traffic ships with five rate limiting algorithms. Each has its own strengths,
+and picking the right one depends on what you're trying to achieve.
+
+This guide will help you understand the tradeoffs and choose wisely.
+
+Overview
+--------
+
+Here's the quick comparison:
+
+.. list-table::
+   :header-rows: 1
+   :widths: 20 40 40
+
+   * - Algorithm
+     - Best For
+     - Tradeoffs
+   * - **Token Bucket**
+     - APIs that need burst handling
+     - Allows temporary spikes above average rate
+   * - **Sliding Window**
+     - Precise rate limiting
+     - Higher memory usage
+   * - **Fixed Window**
+     - Simple, low-overhead limiting
+     - Boundary issues (2x burst at window edges)
+   * - **Leaky Bucket**
+     - Consistent throughput
+     - No burst handling
+   * - **Sliding Window Counter**
+     - General purpose (default)
+     - Good balance of precision and efficiency
+
+Token Bucket
+------------
+
+Think of this as a bucket that holds tokens. Each request consumes a token, and
+tokens refill at a steady rate. If the bucket is empty, requests are rejected.
+
+.. code-block:: python
+
+   from fastapi_traffic import rate_limit, Algorithm
+
+   @app.get("/api/data")
+   @rate_limit(
+       100,  # 100 tokens refill per minute
+       60,
+       algorithm=Algorithm.TOKEN_BUCKET,
+       burst_size=20,  # bucket can hold up to 20 tokens
+   )
+   async def get_data(request: Request):
+       return {"data": "here"}
+
+**How it works:**
+
+1. The bucket starts full (at ``burst_size`` capacity)
+2. Each request removes one token
+3. Tokens refill at ``limit / window_size`` per second
+4. If no tokens are available, the request is rejected
+
+**When to use it:**
+
+- Your API has legitimate burst traffic (e.g., page loads that trigger multiple requests)
+- You want to allow short spikes while maintaining an average rate
+- Mobile apps that batch requests when coming online
+
+**Example scenario:** A mobile app that syncs data when it reconnects. You want to
+allow it to catch up quickly, but not overwhelm your servers.
+
+Sliding Window
+--------------
+
+This algorithm tracks the exact timestamp of every request within the window. It's
+the most accurate approach, but uses more memory.
+
+.. code-block:: python
+
+   @app.get("/api/transactions")
+   @rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW)
+   async def get_transactions(request: Request):
+       return {"transactions": []}
+
+**How it works:**
+
+1. Every request timestamp is stored
+2. When checking, we count requests in the last ``window_size`` seconds
+3. Old timestamps are cleaned up automatically
+
+**When to use it:**
+
+- You need precise rate limiting (financial APIs, compliance requirements)
+- Memory isn't a major concern
+- The rate limit is relatively low (not millions of requests)
+
+**Tradeoffs:**
+
+- Memory usage grows with request volume
+- Slightly more CPU for timestamp management
+
+Fixed Window
+------------
+
+The simplest algorithm. Divide time into fixed windows (e.g., every minute) and
+count requests in each window.
+
+.. code-block:: python
+
+   @app.get("/api/simple")
+   @rate_limit(100, 60, algorithm=Algorithm.FIXED_WINDOW)
+   async def simple_endpoint(request: Request):
+       return {"status": "ok"}
+
+**How it works:**
+
+1. Time is divided into fixed windows (0:00-1:00, 1:00-2:00, etc.)
+2. Each request increments the counter for the current window
+3. When the window changes, the counter resets
+
+**When to use it:**
+
+- You want the simplest, most efficient option
+- Slight inaccuracy at window boundaries is acceptable
+- High-volume scenarios where memory matters
+
+**The boundary problem:**
+
+A client could make 100 requests at 0:59 and another 100 at 1:01, effectively
+getting 200 requests in 2 seconds. If this matters for your use case, use
+sliding window counter instead.
+
+Leaky Bucket
+------------
+
+Imagine a bucket with a hole in the bottom. Requests fill the bucket, and it
+"leaks" at a constant rate. If the bucket overflows, requests are rejected.
+
+.. code-block:: python
+
+   @app.get("/api/steady")
+   @rate_limit(
+       100,
+       60,
+       algorithm=Algorithm.LEAKY_BUCKET,
+       burst_size=10,  # bucket capacity
+   )
+   async def steady_endpoint(request: Request):
+       return {"status": "ok"}
+
+**How it works:**
+
+1. The bucket has a maximum capacity (``burst_size``)
+2. Each request adds "water" to the bucket
+3. Water leaks out at ``limit / window_size`` per second
+4. If the bucket would overflow, the request is rejected
+
+**When to use it:**
+
+- You need consistent, smooth throughput
+- Downstream systems can't handle bursts
+- Processing capacity is truly fixed (e.g., hardware limitations)
+
+**Difference from token bucket:**
+
+- Token bucket allows bursts up to the bucket size
+- Leaky bucket smooths out traffic to a constant rate
+
+Sliding Window Counter
+----------------------
+
+This is the default algorithm, and it's a good choice for most use cases. It
+combines the efficiency of fixed windows with better accuracy.
+
+.. code-block:: python
+
+   @app.get("/api/default")
+   @rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW_COUNTER)
+   async def default_endpoint(request: Request):
+       return {"status": "ok"}
+
+**How it works:**
+
+1. Maintains counters for the current and previous windows
+2. Calculates a weighted average based on how far into the current window we are
+3. At 30 seconds into a 60-second window: ``count = prev_count * 0.5 + curr_count``
+
+**When to use it:**
+
+- General purpose rate limiting
+- You want better accuracy than fixed window without the memory cost of sliding window
+- Most APIs fall into this category
+
+**Why it's the default:**
+
+It gives you 90% of the accuracy of sliding window with the memory efficiency of
+fixed window. Unless you have specific requirements, this is probably what you want.
+
+Choosing the Right Algorithm
+----------------------------
+
+Here's a decision tree:
+
+1. **Do you need to allow bursts?**
+   
+   - Yes → Token Bucket
+   - No, I need smooth traffic → Leaky Bucket
+
+2. **Do you need exact precision?**
+   
+   - Yes, compliance/financial → Sliding Window
+   - No, good enough is fine → Continue
+
+3. **Is memory a concern?**
+   
+   - Yes, high volume → Fixed Window
+   - No → Sliding Window Counter (default)
+
+Performance Comparison
+----------------------
+
+All algorithms are O(1) for the check operation, but they differ in storage:
+
+.. list-table::
+   :header-rows: 1
+
+   * - Algorithm
+     - Storage per Key
+     - Operations
+   * - Token Bucket
+     - 2 floats
+     - 1 read, 1 write
+   * - Sliding Window
+     - N timestamps
+     - 1 read, 1 write, cleanup
+   * - Fixed Window
+     - 1 int, 1 float
+     - 1 read, 1 write
+   * - Leaky Bucket
+     - 2 floats
+     - 1 read, 1 write
+   * - Sliding Window Counter
+     - 3 values
+     - 1 read, 1 write
+
+For most applications, the performance difference is negligible. Choose based on
+behavior, not performance, unless you're handling millions of requests per second.
+
+Code Examples
+-------------
+
+Here's a complete example showing all algorithms:
+
+.. code-block:: python
+
+   from fastapi import FastAPI, Request
+   from fastapi_traffic import rate_limit, Algorithm
+
+   app = FastAPI()
+
+   # Burst-friendly endpoint
+   @app.get("/api/burst")
+   @rate_limit(100, 60, algorithm=Algorithm.TOKEN_BUCKET, burst_size=25)
+   async def burst_endpoint(request: Request):
+       return {"type": "token_bucket"}
+
+   # Precise limiting
+   @app.get("/api/precise")
+   @rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW)
+   async def precise_endpoint(request: Request):
+       return {"type": "sliding_window"}
+
+   # Simple and efficient
+   @app.get("/api/simple")
+   @rate_limit(100, 60, algorithm=Algorithm.FIXED_WINDOW)
+   async def simple_endpoint(request: Request):
+       return {"type": "fixed_window"}
+
+   # Smooth throughput
+   @app.get("/api/steady")
+   @rate_limit(100, 60, algorithm=Algorithm.LEAKY_BUCKET)
+   async def steady_endpoint(request: Request):
+       return {"type": "leaky_bucket"}
+
+   # Best of both worlds (default)
+   @app.get("/api/balanced")
+   @rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW_COUNTER)
+   async def balanced_endpoint(request: Request):
+       return {"type": "sliding_window_counter"}
--- a/docs/user-guide/backends.rst
+++ b/docs/user-guide/backends.rst
@@ -0,0 +1,312 @@
+Storage Backends
+================
+
+FastAPI Traffic needs somewhere to store rate limit state — how many requests each
+client has made, when their window resets, and so on. That's what backends are for.
+
+You have three options, each suited to different deployment scenarios.
+
+Choosing a Backend
+------------------
+
+Here's the quick guide:
+
+.. list-table::
+   :header-rows: 1
+   :widths: 20 30 50
+
+   * - Backend
+     - Use When
+     - Limitations
+   * - **Memory**
+     - Development, single-process apps
+     - Lost on restart, doesn't share across processes
+   * - **SQLite**
+     - Single-node production
+     - Doesn't share across machines
+   * - **Redis**
+     - Distributed systems, multiple nodes
+     - Requires Redis infrastructure
+
+Memory Backend
+--------------
+
+The default backend. It stores everything in memory using a dictionary with LRU
+eviction and automatic TTL cleanup.
+
+.. code-block:: python
+
+   from fastapi_traffic import MemoryBackend, RateLimiter
+   from fastapi_traffic.core.limiter import set_limiter
+
+   # This is what happens by default, but you can configure it:
+   backend = MemoryBackend(
+       max_size=10000,      # Maximum number of keys to store
+       cleanup_interval=60, # How often to clean expired entries (seconds)
+   )
+   limiter = RateLimiter(backend)
+   set_limiter(limiter)
+
+**When to use it:**
+
+- Local development
+- Single-process applications
+- Testing and CI/CD pipelines
+- When you don't need persistence
+
+**Limitations:**
+
+- State is lost when the process restarts
+- Doesn't work with multiple workers (each worker has its own memory)
+- Not suitable for ``gunicorn`` with multiple workers or Kubernetes pods
+
+**Memory management:**
+
+The backend automatically evicts old entries when it hits ``max_size``. It uses
+LRU (Least Recently Used) eviction, so inactive clients get cleaned up first.
+
+SQLite Backend
+--------------
+
+For single-node production deployments where you need persistence. Rate limits
+survive restarts and work across multiple processes on the same machine.
+
+.. code-block:: python
+
+   from fastapi_traffic import SQLiteBackend, RateLimiter
+   from fastapi_traffic.core.limiter import set_limiter
+
+   backend = SQLiteBackend(
+       "rate_limits.db",     # Database file path
+       cleanup_interval=300, # Clean expired entries every 5 minutes
+   )
+   limiter = RateLimiter(backend)
+   set_limiter(limiter)
+
+   @app.on_event("startup")
+   async def startup():
+       await limiter.initialize()
+
+   @app.on_event("shutdown")
+   async def shutdown():
+       await limiter.close()
+
+**When to use it:**
+
+- Single-server deployments
+- When you need rate limits to survive restarts
+- Multiple workers on the same machine (gunicorn, uvicorn with workers)
+- When Redis is overkill for your use case
+
+**Performance notes:**
+
+- Uses WAL (Write-Ahead Logging) mode for better concurrent performance
+- Connection pooling is handled automatically
+- Writes are batched where possible
+
+**File location:**
+
+Put the database file somewhere persistent. For Docker deployments, mount a volume:
+
+.. code-block:: yaml
+
+   # docker-compose.yml
+   services:
+     api:
+       volumes:
+         - ./data:/app/data
+       environment:
+         - RATE_LIMIT_DB=/app/data/rate_limits.db
+
+Redis Backend
+-------------
+
+The go-to choice for distributed systems. All your application instances share
+the same rate limit state.
+
+.. code-block:: python
+
+   from fastapi_traffic import RateLimiter
+   from fastapi_traffic.backends.redis import RedisBackend
+   from fastapi_traffic.core.limiter import set_limiter
+
+   @app.on_event("startup")
+   async def startup():
+       backend = await RedisBackend.from_url(
+           "redis://localhost:6379/0",
+           key_prefix="myapp:ratelimit",  # Optional prefix for all keys
+       )
+       limiter = RateLimiter(backend)
+       set_limiter(limiter)
+       await limiter.initialize()
+
+   @app.on_event("shutdown")
+   async def shutdown():
+       await limiter.close()
+
+**When to use it:**
+
+- Multiple application instances (Kubernetes, load-balanced servers)
+- When you need rate limits shared across your entire infrastructure
+- High-availability requirements
+
+**Connection options:**
+
+.. code-block:: python
+
+   # Simple connection
+   backend = await RedisBackend.from_url("redis://localhost:6379/0")
+
+   # With authentication
+   backend = await RedisBackend.from_url("redis://:password@localhost:6379/0")
+
+   # Redis Sentinel for HA
+   backend = await RedisBackend.from_url(
+       "redis://sentinel1:26379/0",
+       sentinel_master="mymaster",
+   )
+
+   # Redis Cluster
+   backend = await RedisBackend.from_url("redis://node1:6379,node2:6379,node3:6379/0")
+
+**Atomic operations:**
+
+The Redis backend uses Lua scripts to ensure atomic operations. This means rate
+limit checks are accurate even under high concurrency — no race conditions.
+
+**Key expiration:**
+
+Keys automatically expire based on the rate limit window. You don't need to worry
+about Redis filling up with stale data.
+
+Switching Backends
+------------------
+
+You can switch backends without changing your rate limiting code. Just configure
+a different backend at startup:
+
+.. code-block:: python
+
+   import os
+   from fastapi_traffic import RateLimiter, MemoryBackend, SQLiteBackend
+   from fastapi_traffic.core.limiter import set_limiter
+
+   def get_backend():
+       """Choose backend based on environment."""
+       env = os.getenv("ENVIRONMENT", "development")
+       
+       if env == "production":
+           redis_url = os.getenv("REDIS_URL")
+           if redis_url:
+               from fastapi_traffic.backends.redis import RedisBackend
+               return RedisBackend.from_url(redis_url)
+           return SQLiteBackend("/app/data/rate_limits.db")
+       
+       return MemoryBackend()
+
+   @app.on_event("startup")
+   async def startup():
+       backend = await get_backend()
+       limiter = RateLimiter(backend)
+       set_limiter(limiter)
+       await limiter.initialize()
+
+Custom Backends
+---------------
+
+Need something different? Maybe you want to use PostgreSQL, DynamoDB, or some
+other storage system. You can implement your own backend:
+
+.. code-block:: python
+
+   from fastapi_traffic.backends.base import Backend
+   from typing import Any
+
+   class MyCustomBackend(Backend):
+       async def get(self, key: str) -> dict[str, Any] | None:
+           """Retrieve state for a key."""
+           # Your implementation here
+           pass
+
+       async def set(self, key: str, value: dict[str, Any], *, ttl: float) -> None:
+           """Store state with TTL."""
+           pass
+
+       async def delete(self, key: str) -> None:
+           """Delete a key."""
+           pass
+
+       async def exists(self, key: str) -> bool:
+           """Check if key exists."""
+           pass
+
+       async def increment(self, key: str, amount: int = 1) -> int:
+           """Atomically increment a counter."""
+           pass
+
+       async def clear(self) -> None:
+           """Clear all data."""
+           pass
+
+       async def close(self) -> None:
+           """Clean up resources."""
+           pass
+
+The key methods are ``get``, ``set``, and ``delete``. The state is stored as a
+dictionary, and the backend is responsible for serialization.
+
+Backend Comparison
+------------------
+
+.. list-table::
+   :header-rows: 1
+
+   * - Feature
+     - Memory
+     - SQLite
+     - Redis
+   * - Persistence
+     - ❌
+     - ✅
+     - ✅
+   * - Multi-process
+     - ❌
+     - ✅
+     - ✅
+   * - Multi-node
+     - ❌
+     - ❌
+     - ✅
+   * - Setup complexity
+     - None
+     - Low
+     - Medium
+   * - Latency
+     - ~0.01ms
+     - ~0.1ms
+     - ~1ms
+   * - Dependencies
+     - None
+     - None
+     - redis package
+
+Best Practices
+--------------
+
+1. **Start with Memory, upgrade when needed.** Don't over-engineer. Memory is
+   fine for development and many production scenarios.
+
+2. **Use Redis for distributed systems.** If you have multiple application
+   instances, Redis is the only option that works correctly.
+
+3. **Handle backend errors gracefully.** Set ``skip_on_error=True`` if you'd
+   rather allow requests through than fail when the backend is down:
+
+   .. code-block:: python
+
+      @rate_limit(100, 60, skip_on_error=True)
+      async def endpoint(request: Request):
+          return {"status": "ok"}
+
+4. **Monitor your backend.** Keep an eye on memory usage (Memory backend),
+   disk space (SQLite), or Redis memory and connections.
--- a/docs/user-guide/configuration.rst
+++ b/docs/user-guide/configuration.rst
@@ -0,0 +1,315 @@
+Configuration
+=============
+
+FastAPI Traffic supports loading configuration from environment variables and files.
+This makes it easy to manage settings across different environments without changing code.
+
+Configuration Loader
+--------------------
+
+The ``ConfigLoader`` class handles loading configuration from various sources:
+
+.. code-block:: python
+
+   from fastapi_traffic import ConfigLoader, RateLimitConfig
+
+   loader = ConfigLoader()
+
+   # Load from environment variables
+   config = loader.load_rate_limit_config_from_env()
+
+   # Load from a JSON file
+   config = loader.load_rate_limit_config_from_json("config/rate_limits.json")
+
+   # Load from a .env file
+   config = loader.load_rate_limit_config_from_env_file(".env")
+
+Environment Variables
+---------------------
+
+Set rate limit configuration using environment variables with the ``FASTAPI_TRAFFIC_``
+prefix:
+
+.. code-block:: bash
+
+   # Basic settings
+   export FASTAPI_TRAFFIC_RATE_LIMIT_LIMIT=100
+   export FASTAPI_TRAFFIC_RATE_LIMIT_WINDOW_SIZE=60
+   export FASTAPI_TRAFFIC_RATE_LIMIT_ALGORITHM=sliding_window_counter
+
+   # Optional settings
+   export FASTAPI_TRAFFIC_RATE_LIMIT_KEY_PREFIX=myapp
+   export FASTAPI_TRAFFIC_RATE_LIMIT_BURST_SIZE=20
+   export FASTAPI_TRAFFIC_RATE_LIMIT_INCLUDE_HEADERS=true
+   export FASTAPI_TRAFFIC_RATE_LIMIT_ERROR_MESSAGE="Too many requests"
+   export FASTAPI_TRAFFIC_RATE_LIMIT_STATUS_CODE=429
+   export FASTAPI_TRAFFIC_RATE_LIMIT_SKIP_ON_ERROR=false
+   export FASTAPI_TRAFFIC_RATE_LIMIT_COST=1
+
+Then load them in your app:
+
+.. code-block:: python
+
+   from fastapi_traffic import load_rate_limit_config_from_env, rate_limit
+
+   # Load config from environment
+   config = load_rate_limit_config_from_env()
+
+   # Use it with the decorator
+   @app.get("/api/data")
+   @rate_limit(config.limit, config.window_size, algorithm=config.algorithm)
+   async def get_data(request: Request):
+       return {"data": "here"}
+
+Custom Prefix
+-------------
+
+If ``FASTAPI_TRAFFIC_`` conflicts with something else, use a custom prefix:
+
+.. code-block:: python
+
+   loader = ConfigLoader(prefix="MYAPP_RATELIMIT")
+   config = loader.load_rate_limit_config_from_env()
+
+   # Now reads from:
+   # MYAPP_RATELIMIT_RATE_LIMIT_LIMIT=100
+   # MYAPP_RATELIMIT_RATE_LIMIT_WINDOW_SIZE=60
+   # etc.
+
+JSON Configuration
+------------------
+
+For more complex setups, use a JSON file:
+
+.. code-block:: json
+
+   {
+     "limit": 100,
+     "window_size": 60,
+     "algorithm": "token_bucket",
+     "burst_size": 25,
+     "key_prefix": "api",
+     "include_headers": true,
+     "error_message": "Rate limit exceeded. Please slow down.",
+     "status_code": 429,
+     "skip_on_error": false,
+     "cost": 1
+   }
+
+Load it:
+
+.. code-block:: python
+
+   from fastapi_traffic import ConfigLoader
+
+   loader = ConfigLoader()
+   config = loader.load_rate_limit_config_from_json("config/rate_limits.json")
+
+.env Files
+----------
+
+You can also use ``.env`` files, which is handy for local development:
+
+.. code-block:: bash
+
+   # .env
+   FASTAPI_TRAFFIC_RATE_LIMIT_LIMIT=100
+   FASTAPI_TRAFFIC_RATE_LIMIT_WINDOW_SIZE=60
+   FASTAPI_TRAFFIC_RATE_LIMIT_ALGORITHM=sliding_window
+
+Load it:
+
+.. code-block:: python
+
+   loader = ConfigLoader()
+   config = loader.load_rate_limit_config_from_env_file(".env")
+
+Global Configuration
+--------------------
+
+Besides per-endpoint configuration, you can set global defaults:
+
+.. code-block:: bash
+
+   # Global settings
+   export FASTAPI_TRAFFIC_GLOBAL_ENABLED=true
+   export FASTAPI_TRAFFIC_GLOBAL_DEFAULT_LIMIT=100
+   export FASTAPI_TRAFFIC_GLOBAL_DEFAULT_WINDOW_SIZE=60
+   export FASTAPI_TRAFFIC_GLOBAL_DEFAULT_ALGORITHM=sliding_window_counter
+   export FASTAPI_TRAFFIC_GLOBAL_KEY_PREFIX=fastapi_traffic
+   export FASTAPI_TRAFFIC_GLOBAL_INCLUDE_HEADERS=true
+   export FASTAPI_TRAFFIC_GLOBAL_ERROR_MESSAGE="Rate limit exceeded"
+   export FASTAPI_TRAFFIC_GLOBAL_STATUS_CODE=429
+   export FASTAPI_TRAFFIC_GLOBAL_SKIP_ON_ERROR=false
+   export FASTAPI_TRAFFIC_GLOBAL_HEADERS_PREFIX=X-RateLimit
+
+Load global config:
+
+.. code-block:: python
+
+   from fastapi_traffic import load_global_config_from_env, RateLimiter
+   from fastapi_traffic.core.limiter import set_limiter
+
+   global_config = load_global_config_from_env()
+   limiter = RateLimiter(config=global_config)
+   set_limiter(limiter)
+
+Auto-Detection
+--------------
+
+The convenience functions automatically detect file format:
+
+.. code-block:: python
+
+   from fastapi_traffic import load_rate_limit_config, load_global_config
+
+   # Detects JSON by extension
+   config = load_rate_limit_config("config/limits.json")
+
+   # Detects .env file
+   config = load_rate_limit_config("config/.env")
+
+   # Works for global config too
+   global_config = load_global_config("config/global.json")
+
+Overriding Values
+-----------------
+
+You can override loaded values programmatically:
+
+.. code-block:: python
+
+   loader = ConfigLoader()
+   
+   # Load base config from file
+   config = loader.load_rate_limit_config_from_json(
+       "config/base.json",
+       limit=200,  # Override the limit
+       key_prefix="custom",  # Override the prefix
+   )
+
+This is useful for environment-specific overrides:
+
+.. code-block:: python
+
+   import os
+
+   base_config = loader.load_rate_limit_config_from_json("config/base.json")
+
+   # Apply environment-specific overrides
+   if os.getenv("ENVIRONMENT") == "production":
+       config = loader.load_rate_limit_config_from_json(
+           "config/base.json",
+           limit=base_config.limit * 2,  # Double the limit in production
+       )
+
+Validation
+----------
+
+Configuration is validated when loaded. Invalid values raise ``ConfigurationError``:
+
+.. code-block:: python
+
+   from fastapi_traffic import ConfigLoader, ConfigurationError
+
+   loader = ConfigLoader()
+
+   try:
+       config = loader.load_rate_limit_config_from_env()
+   except ConfigurationError as e:
+       print(f"Invalid configuration: {e}")
+       # Handle the error appropriately
+
+Common validation errors:
+
+- ``limit`` must be a positive integer
+- ``window_size`` must be a positive number
+- ``algorithm`` must be one of the valid algorithm names
+- ``status_code`` must be a valid HTTP status code
+
+Algorithm Names
+---------------
+
+When specifying algorithms in configuration, use these names:
+
+.. list-table::
+   :header-rows: 1
+
+   * - Config Value
+     - Algorithm
+   * - ``token_bucket``
+     - Token Bucket
+   * - ``sliding_window``
+     - Sliding Window
+   * - ``fixed_window``
+     - Fixed Window
+   * - ``leaky_bucket``
+     - Leaky Bucket
+   * - ``sliding_window_counter``
+     - Sliding Window Counter (default)
+
+Boolean Values
+--------------
+
+Boolean settings accept various formats:
+
+- **True:** ``true``, ``1``, ``yes``, ``on``
+- **False:** ``false``, ``0``, ``no``, ``off``
+
+Case doesn't matter.
+
+Complete Example
+----------------
+
+Here's a full example showing configuration loading in a real app:
+
+.. code-block:: python
+
+   import os
+   from fastapi import FastAPI, Request
+   from fastapi_traffic import (
+       ConfigLoader,
+       ConfigurationError,
+       RateLimiter,
+       rate_limit,
+   )
+   from fastapi_traffic.core.limiter import set_limiter
+
+   app = FastAPI()
+
+   @app.on_event("startup")
+   async def startup():
+       loader = ConfigLoader()
+       
+       try:
+           # Try to load from environment first
+           global_config = loader.load_global_config_from_env()
+       except ConfigurationError:
+           # Fall back to defaults
+           global_config = None
+       
+       limiter = RateLimiter(config=global_config)
+       set_limiter(limiter)
+       await limiter.initialize()
+
+   @app.get("/api/data")
+   @rate_limit(100, 60)
+   async def get_data(request: Request):
+       return {"data": "here"}
+
+   # Or load endpoint-specific config
+   loader = ConfigLoader()
+   try:
+       api_config = loader.load_rate_limit_config_from_json("config/api_limits.json")
+   except (FileNotFoundError, ConfigurationError):
+       api_config = None
+
+   if api_config:
+       @app.get("/api/special")
+       @rate_limit(
+           api_config.limit,
+           api_config.window_size,
+           algorithm=api_config.algorithm,
+       )
+       async def special_endpoint(request: Request):
+           return {"special": "data"}
--- a/docs/user-guide/exception-handling.rst
+++ b/docs/user-guide/exception-handling.rst
@@ -0,0 +1,277 @@
+Exception Handling
+==================
+
+When a client exceeds their rate limit, FastAPI Traffic raises a ``RateLimitExceeded``
+exception. This guide covers how to handle it gracefully.
+
+Default Behavior
+----------------
+
+By default, when a rate limit is exceeded, the library raises ``RateLimitExceeded``.
+FastAPI will convert this to a 500 error unless you handle it.
+
+The exception contains useful information:
+
+.. code-block:: python
+
+   from fastapi_traffic import RateLimitExceeded
+
+   try:
+       # Rate limited operation
+       pass
+   except RateLimitExceeded as exc:
+       print(exc.message)       # "Rate limit exceeded"
+       print(exc.retry_after)   # Seconds until they can retry (e.g., 45.2)
+       print(exc.limit_info)    # RateLimitInfo object with full details
+
+Custom Exception Handler
+------------------------
+
+The most common approach is to register a custom exception handler:
+
+.. code-block:: python
+
+   from fastapi import FastAPI, Request
+   from fastapi.responses import JSONResponse
+   from fastapi_traffic import RateLimitExceeded
+
+   app = FastAPI()
+
+   @app.exception_handler(RateLimitExceeded)
+   async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
+       return JSONResponse(
+           status_code=429,
+           content={
+               "error": "rate_limit_exceeded",
+               "message": "You're making too many requests. Please slow down.",
+               "retry_after": exc.retry_after,
+           },
+           headers={
+               "Retry-After": str(int(exc.retry_after or 60)),
+           },
+       )
+
+Now clients get a clean JSON response instead of a generic error.
+
+Including Rate Limit Headers
+----------------------------
+
+The ``limit_info`` object can generate standard rate limit headers:
+
+.. code-block:: python
+
+   @app.exception_handler(RateLimitExceeded)
+   async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
+       headers = {}
+       if exc.limit_info:
+           headers = exc.limit_info.to_headers()
+       
+       return JSONResponse(
+           status_code=429,
+           content={
+               "error": "rate_limit_exceeded",
+               "retry_after": exc.retry_after,
+           },
+           headers=headers,
+       )
+
+This adds headers like:
+
+.. code-block:: text
+
+   X-RateLimit-Limit: 100
+   X-RateLimit-Remaining: 0
+   X-RateLimit-Reset: 1709834400
+   Retry-After: 45
+
+Different Responses for Different Endpoints
+-------------------------------------------
+
+You might want different error messages for different parts of your API:
+
+.. code-block:: python
+
+   @app.exception_handler(RateLimitExceeded)
+   async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
+       path = request.url.path
+       
+       if path.startswith("/api/v1/"):
+           # API clients get JSON
+           return JSONResponse(
+               status_code=429,
+               content={"error": "rate_limit_exceeded", "retry_after": exc.retry_after},
+           )
+       elif path.startswith("/web/"):
+           # Web users get a friendly HTML page
+           return HTMLResponse(
+               status_code=429,
+               content="<h1>Slow down!</h1><p>Please wait a moment before trying again.</p>",
+           )
+       else:
+           # Default response
+           return JSONResponse(
+               status_code=429,
+               content={"detail": exc.message},
+           )
+
+Using the on_blocked Callback
+-----------------------------
+
+Instead of (or in addition to) exception handling, you can use the ``on_blocked``
+callback to run code when a request is blocked:
+
+.. code-block:: python
+
+   import logging
+
+   logger = logging.getLogger(__name__)
+
+   def log_blocked_request(request: Request, result):
+       """Log when a request is rate limited."""
+       client_ip = request.client.host if request.client else "unknown"
+       logger.warning(
+           "Rate limit exceeded for %s on %s %s",
+           client_ip,
+           request.method,
+           request.url.path,
+       )
+
+   @app.get("/api/data")
+   @rate_limit(100, 60, on_blocked=log_blocked_request)
+   async def get_data(request: Request):
+       return {"data": "here"}
+
+The callback receives the request and the rate limit result. It runs before the
+exception is raised.
+
+Exempting Certain Requests
+--------------------------
+
+Use ``exempt_when`` to skip rate limiting for certain requests:
+
+.. code-block:: python
+
+   def is_admin(request: Request) -> bool:
+       """Check if request is from an admin."""
+       user = getattr(request.state, "user", None)
+       return user is not None and user.is_admin
+
+   @app.get("/api/data")
+   @rate_limit(100, 60, exempt_when=is_admin)
+   async def get_data(request: Request):
+       return {"data": "here"}
+
+Admin requests bypass rate limiting entirely.
+
+Graceful Degradation
+--------------------
+
+Sometimes you'd rather serve a degraded response than reject the request entirely:
+
+.. code-block:: python
+
+   from fastapi_traffic import RateLimiter, RateLimitConfig
+   from fastapi_traffic.core.limiter import get_limiter
+
+   @app.get("/api/search")
+   async def search(request: Request, q: str):
+       limiter = get_limiter()
+       config = RateLimitConfig(limit=100, window_size=60)
+       
+       result = await limiter.check(request, config)
+       
+       if not result.allowed:
+           # Return cached/simplified results instead of blocking
+           return {
+               "results": get_cached_results(q),
+               "note": "Results may be stale. Please try again later.",
+               "retry_after": result.info.retry_after,
+           }
+       
+       # Full search
+       return {"results": perform_full_search(q)}
+
+Backend Errors
+--------------
+
+If the rate limit backend fails (Redis down, SQLite locked, etc.), you have options:
+
+**Option 1: Fail closed (default)**
+
+Requests fail when the backend is unavailable. Safer, but impacts availability.
+
+**Option 2: Fail open**
+
+Allow requests through when the backend fails:
+
+.. code-block:: python
+
+   @app.get("/api/data")
+   @rate_limit(100, 60, skip_on_error=True)
+   async def get_data(request: Request):
+       return {"data": "here"}
+
+**Option 3: Handle the error explicitly**
+
+.. code-block:: python
+
+   from fastapi_traffic import BackendError
+
+   @app.exception_handler(BackendError)
+   async def backend_error_handler(request: Request, exc: BackendError):
+       # Log the error
+       logger.error("Rate limit backend error: %s", exc.original_error)
+       
+       # Decide what to do
+       # Option A: Allow the request
+       return None  # Let the request continue
+       
+       # Option B: Return an error
+       return JSONResponse(
+           status_code=503,
+           content={"error": "service_unavailable"},
+       )
+
+Other Exceptions
+----------------
+
+FastAPI Traffic defines a few exception types:
+
+.. code-block:: python
+
+   from fastapi_traffic import (
+       RateLimitExceeded,   # Rate limit was exceeded
+       BackendError,        # Storage backend failed
+       ConfigurationError,  # Invalid configuration
+   )
+
+All inherit from ``FastAPITrafficError``:
+
+.. code-block:: python
+
+   from fastapi_traffic.exceptions import FastAPITrafficError
+
+   @app.exception_handler(FastAPITrafficError)
+   async def traffic_error_handler(request: Request, exc: FastAPITrafficError):
+       """Catch-all for FastAPI Traffic errors."""
+       if isinstance(exc, RateLimitExceeded):
+           return JSONResponse(status_code=429, content={"error": "rate_limited"})
+       elif isinstance(exc, BackendError):
+           return JSONResponse(status_code=503, content={"error": "backend_error"})
+       else:
+           return JSONResponse(status_code=500, content={"error": "internal_error"})
+
+Helper Function
+---------------
+
+FastAPI Traffic provides a helper to create rate limit responses:
+
+.. code-block:: python
+
+   from fastapi_traffic.core.decorator import create_rate_limit_response
+
+   @app.exception_handler(RateLimitExceeded)
+   async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
+       return create_rate_limit_response(exc, include_headers=True)
+
+This creates a standard 429 response with all the appropriate headers.
--- a/docs/user-guide/key-extractors.rst
+++ b/docs/user-guide/key-extractors.rst
@@ -0,0 +1,258 @@
+Key Extractors
+==============
+
+A key extractor is a function that identifies who's making a request. By default,
+FastAPI Traffic uses the client's IP address, but you can customize this to fit
+your authentication model.
+
+How It Works
+------------
+
+Every rate limit needs a way to group requests. The key extractor returns a string
+that identifies the client:
+
+.. code-block:: python
+
+   def my_key_extractor(request: Request) -> str:
+       return "some-unique-identifier"
+
+All requests that return the same identifier share the same rate limit bucket.
+
+Default Behavior
+----------------
+
+The default extractor looks for the client IP in this order:
+
+1. ``X-Forwarded-For`` header (first IP in the list)
+2. ``X-Real-IP`` header
+3. Direct connection IP (``request.client.host``)
+4. Falls back to ``"unknown"``
+
+This handles most reverse proxy setups automatically.
+
+Rate Limiting by API Key
+------------------------
+
+For authenticated APIs, you probably want to limit by API key:
+
+.. code-block:: python
+
+   from fastapi import Request
+   from fastapi_traffic import rate_limit
+
+   def api_key_extractor(request: Request) -> str:
+       """Rate limit by API key."""
+       api_key = request.headers.get("X-API-Key")
+       if api_key:
+           return f"apikey:{api_key}"
+       # Fall back to IP for unauthenticated requests
+       return f"ip:{request.client.host}" if request.client else "ip:unknown"
+
+   @app.get("/api/data")
+   @rate_limit(1000, 3600, key_extractor=api_key_extractor)
+   async def get_data(request: Request):
+       return {"data": "here"}
+
+Now each API key gets its own rate limit bucket.
+
+Rate Limiting by User
+---------------------
+
+If you're using authentication middleware that sets the user:
+
+.. code-block:: python
+
+   def user_extractor(request: Request) -> str:
+       """Rate limit by authenticated user."""
+       # Assuming your auth middleware sets request.state.user
+       user = getattr(request.state, "user", None)
+       if user:
+           return f"user:{user.id}"
+       return f"ip:{request.client.host}" if request.client else "ip:unknown"
+
+   @app.get("/api/profile")
+   @rate_limit(100, 60, key_extractor=user_extractor)
+   async def get_profile(request: Request):
+       return {"profile": "data"}
+
+Rate Limiting by Tenant
+-----------------------
+
+For multi-tenant applications:
+
+.. code-block:: python
+
+   def tenant_extractor(request: Request) -> str:
+       """Rate limit by tenant."""
+       # From subdomain
+       host = request.headers.get("host", "")
+       if "." in host:
+           tenant = host.split(".")[0]
+           return f"tenant:{tenant}"
+       
+       # Or from header
+       tenant = request.headers.get("X-Tenant-ID")
+       if tenant:
+           return f"tenant:{tenant}"
+       
+       return "tenant:default"
+
+Combining Identifiers
+---------------------
+
+Sometimes you want to combine multiple factors:
+
+.. code-block:: python
+
+   def combined_extractor(request: Request) -> str:
+       """Rate limit by user AND endpoint."""
+       user = getattr(request.state, "user", None)
+       user_id = user.id if user else "anonymous"
+       endpoint = request.url.path
+       return f"{user_id}:{endpoint}"
+
+This gives each user a separate limit for each endpoint.
+
+Tiered Rate Limits
+------------------
+
+Different users might have different limits. Handle this with a custom extractor
+that includes the tier:
+
+.. code-block:: python
+
+   def tiered_extractor(request: Request) -> str:
+       """Include tier in the key for different limits."""
+       user = getattr(request.state, "user", None)
+       if user:
+           # Premium users get a different bucket
+           tier = "premium" if user.is_premium else "free"
+           return f"{tier}:{user.id}"
+       return f"anonymous:{request.client.host}"
+
+Then apply different limits based on tier:
+
+.. code-block:: python
+
+   # You'd typically do this with middleware or dependency injection
+   # to check the tier and apply the appropriate limit
+
+   @app.get("/api/data")
+   async def get_data(request: Request):
+       user = getattr(request.state, "user", None)
+       if user and user.is_premium:
+           # Premium: 10000 req/hour
+           limit, window = 10000, 3600
+       else:
+           # Free: 100 req/hour
+           limit, window = 100, 3600
+       
+       # Apply rate limit manually
+       limiter = get_limiter()
+       config = RateLimitConfig(limit=limit, window_size=window)
+       await limiter.hit(request, config)
+       
+       return {"data": "here"}
+
+Geographic Rate Limiting
+------------------------
+
+Limit by country or region:
+
+.. code-block:: python
+
+   def geo_extractor(request: Request) -> str:
+       """Rate limit by country."""
+       # Assuming you have a GeoIP lookup
+       country = request.headers.get("CF-IPCountry", "XX")  # Cloudflare header
+       ip = request.client.host if request.client else "unknown"
+       return f"{country}:{ip}"
+
+This lets you apply different limits to different regions if needed.
+
+Endpoint-Specific Keys
+----------------------
+
+Rate limit the same user differently per endpoint:
+
+.. code-block:: python
+
+   def endpoint_user_extractor(request: Request) -> str:
+       """Separate limits per endpoint per user."""
+       user = getattr(request.state, "user", None)
+       user_id = user.id if user else request.client.host
+       method = request.method
+       path = request.url.path
+       return f"{user_id}:{method}:{path}"
+
+Best Practices
+--------------
+
+1. **Always have a fallback.** If your primary identifier isn't available, fall
+   back to IP:
+
+   .. code-block:: python
+
+      def safe_extractor(request: Request) -> str:
+          api_key = request.headers.get("X-API-Key")
+          if api_key:
+              return f"key:{api_key}"
+          return f"ip:{request.client.host if request.client else 'unknown'}"
+
+2. **Use prefixes.** When mixing identifier types, prefix them to avoid collisions:
+
+   .. code-block:: python
+
+      # Good - clear what each key represents
+      return f"user:{user_id}"
+      return f"ip:{ip_address}"
+      return f"key:{api_key}"
+
+      # Bad - could collide
+      return user_id
+      return ip_address
+
+3. **Keep it fast.** The extractor runs on every request. Avoid database lookups
+   or expensive operations:
+
+   .. code-block:: python
+
+      # Bad - database lookup on every request
+      def slow_extractor(request: Request) -> str:
+          user = db.get_user(request.headers.get("Authorization"))
+          return user.id
+
+      # Good - use data already in the request
+      def fast_extractor(request: Request) -> str:
+          return request.state.user.id  # Set by auth middleware
+
+4. **Be consistent.** The same client should always get the same key. Watch out
+   for things like:
+
+   - IP addresses changing (mobile users)
+   - Case sensitivity (normalize to lowercase)
+   - Whitespace (strip it)
+
+   .. code-block:: python
+
+      def normalized_extractor(request: Request) -> str:
+          api_key = request.headers.get("X-API-Key", "").strip().lower()
+          if api_key:
+              return f"key:{api_key}"
+          return f"ip:{request.client.host}"
+
+Using with Middleware
+---------------------
+
+Key extractors work the same way with middleware:
+
+.. code-block:: python
+
+   from fastapi_traffic.middleware import RateLimitMiddleware
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       key_extractor=api_key_extractor,
+   )
--- a/docs/user-guide/middleware.rst
+++ b/docs/user-guide/middleware.rst
@@ -0,0 +1,322 @@
+Middleware
+==========
+
+Sometimes you want rate limiting applied to your entire API, not just individual
+endpoints. That's where middleware comes in.
+
+Middleware sits between the client and your application, checking every request
+before it reaches your endpoints.
+
+Basic Usage
+-----------
+
+Add the middleware to your FastAPI app:
+
+.. code-block:: python
+
+   from fastapi import FastAPI
+   from fastapi_traffic.middleware import RateLimitMiddleware
+
+   app = FastAPI()
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,       # 1000 requests
+       window_size=60,   # per minute
+   )
+
+   @app.get("/api/users")
+   async def get_users():
+       return {"users": []}
+
+   @app.get("/api/posts")
+   async def get_posts():
+       return {"posts": []}
+
+Now every endpoint shares the same rate limit pool. A client who makes 500 requests
+to ``/api/users`` only has 500 left for ``/api/posts``.
+
+Exempting Paths
+---------------
+
+You probably don't want to rate limit your health checks or documentation:
+
+.. code-block:: python
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       exempt_paths={
+           "/health",
+           "/ready",
+           "/docs",
+           "/redoc",
+           "/openapi.json",
+       },
+   )
+
+These paths bypass rate limiting entirely.
+
+Exempting IPs
+-------------
+
+Internal services, monitoring systems, or your own infrastructure might need
+unrestricted access:
+
+.. code-block:: python
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       exempt_ips={
+           "127.0.0.1",
+           "10.0.0.0/8",      # Internal network
+           "192.168.1.100",   # Monitoring server
+       },
+   )
+
+.. note::
+
+   IP exemptions are checked against the client IP extracted by the key extractor.
+   Make sure your proxy headers are configured correctly if you're behind a load
+   balancer.
+
+Custom Key Extraction
+---------------------
+
+By default, clients are identified by IP address. You can change this:
+
+.. code-block:: python
+
+   from starlette.requests import Request
+
+   def get_client_id(request: Request) -> str:
+       """Identify clients by API key, fall back to IP."""
+       api_key = request.headers.get("X-API-Key")
+       if api_key:
+           return f"api:{api_key}"
+       return request.client.host if request.client else "unknown"
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       key_extractor=get_client_id,
+   )
+
+Choosing an Algorithm
+---------------------
+
+The middleware supports all five algorithms:
+
+.. code-block:: python
+
+   from fastapi_traffic.core.algorithms import Algorithm
+
+   # Token bucket for burst-friendly limiting
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       algorithm=Algorithm.TOKEN_BUCKET,
+   )
+
+   # Sliding window for precise limiting
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       algorithm=Algorithm.SLIDING_WINDOW,
+   )
+
+Using a Custom Backend
+----------------------
+
+By default, middleware uses the memory backend. For production, you'll want
+something persistent:
+
+.. code-block:: python
+
+   from fastapi_traffic import SQLiteBackend
+   from fastapi_traffic.middleware import RateLimitMiddleware
+
+   backend = SQLiteBackend("rate_limits.db")
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       backend=backend,
+   )
+
+   @app.on_event("shutdown")
+   async def shutdown():
+       await backend.close()
+
+For Redis:
+
+.. code-block:: python
+
+   from fastapi_traffic.backends.redis import RedisBackend
+
+   # Create backend at startup
+   redis_backend = None
+
+   @app.on_event("startup")
+   async def startup():
+       global redis_backend
+       redis_backend = await RedisBackend.from_url("redis://localhost:6379/0")
+
+   # Note: You'll need to configure middleware after startup
+   # or use a factory pattern
+
+Convenience Middleware Classes
+------------------------------
+
+For common use cases, we provide pre-configured middleware:
+
+.. code-block:: python
+
+   from fastapi_traffic.middleware import (
+       SlidingWindowMiddleware,
+       TokenBucketMiddleware,
+   )
+
+   # Sliding window algorithm
+   app.add_middleware(
+       SlidingWindowMiddleware,
+       limit=1000,
+       window_size=60,
+   )
+
+   # Token bucket algorithm
+   app.add_middleware(
+       TokenBucketMiddleware,
+       limit=1000,
+       window_size=60,
+   )
+
+Combining with Decorator
+------------------------
+
+You can use both middleware and decorators. The middleware provides a baseline
+limit, and decorators can add stricter limits to specific endpoints:
+
+.. code-block:: python
+
+   from fastapi_traffic import rate_limit
+   from fastapi_traffic.middleware import RateLimitMiddleware
+
+   # Global limit: 1000 req/min
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+   )
+
+   # This endpoint has an additional, stricter limit
+   @app.post("/api/expensive-operation")
+   @rate_limit(10, 60)  # Only 10 req/min for this endpoint
+   async def expensive_operation(request: Request):
+       return {"result": "done"}
+
+   # This endpoint uses only the global limit
+   @app.get("/api/cheap-operation")
+   async def cheap_operation():
+       return {"result": "done"}
+
+Both limits are checked. A request must pass both the middleware limit AND the
+decorator limit.
+
+Error Responses
+---------------
+
+When a client exceeds the rate limit, they get a 429 response:
+
+.. code-block:: json
+
+   {
+     "detail": "Rate limit exceeded. Please try again later.",
+     "retry_after": 45.2
+   }
+
+You can customize the message:
+
+.. code-block:: python
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       error_message="Whoa there! You're making requests too fast.",
+       status_code=429,
+   )
+
+Response Headers
+----------------
+
+By default, rate limit headers are included in every response:
+
+.. code-block:: http
+
+   X-RateLimit-Limit: 1000
+   X-RateLimit-Remaining: 847
+   X-RateLimit-Reset: 1709834400
+
+When rate limited:
+
+.. code-block:: http
+
+   Retry-After: 45
+
+Disable headers if you don't want to expose this information:
+
+.. code-block:: python
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       include_headers=False,
+   )
+
+Handling Backend Errors
+-----------------------
+
+What happens if your Redis server goes down? By default, the middleware will
+raise an exception. You can change this behavior:
+
+.. code-block:: python
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       skip_on_error=True,  # Allow requests through if backend fails
+   )
+
+With ``skip_on_error=True``, requests are allowed through when the backend is
+unavailable. This is a tradeoff between availability and protection.
+
+Full Configuration Reference
+----------------------------
+
+.. code-block:: python
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,                    # Max requests per window
+       window_size=60.0,              # Window size in seconds
+       algorithm=Algorithm.SLIDING_WINDOW_COUNTER,  # Algorithm to use
+       backend=None,                  # Storage backend (default: MemoryBackend)
+       key_prefix="middleware",       # Prefix for rate limit keys
+       include_headers=True,          # Add rate limit headers to responses
+       error_message="Rate limit exceeded. Please try again later.",
+       status_code=429,               # HTTP status when limited
+       skip_on_error=False,           # Allow requests if backend fails
+       exempt_paths=None,             # Set of paths to exempt
+       exempt_ips=None,               # Set of IPs to exempt
+       key_extractor=default_key_extractor,  # Function to identify clients
+   )