release: bump version to 0.3.0

- Refactor Redis backend connection handling and pool management - Update algorithm implementations with improved type annotations - Enhance config loader validation with stricter Pydantic schemas - Improve decorator and middleware error handling - Expand example scripts with better docstrings and usage patterns - Add new 00_basic_usage.py example for quick start - Reorganize examples directory structure - Fix type annotation inconsistencies across core modules - Update dependencies in pyproject.toml
2026-03-17 20:55:38 +00:00
parent 492410614f
commit f3453cb0fc
51 changed files with 6507 additions and 166 deletions
--- a/docs/Makefile
+++ b/docs/Makefile
@@ -0,0 +1,14 @@
+# Minimal makefile for Sphinx documentation
+
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--- a/docs/advanced/distributed-systems.rst
+++ b/docs/advanced/distributed-systems.rst
@@ -0,0 +1,319 @@
+Distributed Systems
+===================
+
+Running rate limiting across multiple application instances requires careful
+consideration. This guide covers the patterns and pitfalls.
+
+The Challenge
+-------------
+
+In a distributed system, you might have:
+
+- Multiple application instances behind a load balancer
+- Kubernetes pods that scale up and down
+- Serverless functions that run independently
+
+Each instance needs to share rate limit state. Otherwise, a client could make
+100 requests to instance A and another 100 to instance B, effectively bypassing
+a 100 request limit.
+
+Redis: The Standard Solution
+----------------------------
+
+Redis is the go-to choice for distributed rate limiting:
+
+.. code-block:: python
+
+   from fastapi import FastAPI
+   from fastapi_traffic import RateLimiter
+   from fastapi_traffic.backends.redis import RedisBackend
+   from fastapi_traffic.core.limiter import set_limiter
+
+   app = FastAPI()
+
+   @app.on_event("startup")
+   async def startup():
+       backend = await RedisBackend.from_url(
+           "redis://redis-server:6379/0",
+           key_prefix="myapp:ratelimit",
+       )
+       limiter = RateLimiter(backend)
+       set_limiter(limiter)
+       await limiter.initialize()
+
+   @app.on_event("shutdown")
+   async def shutdown():
+       limiter = get_limiter()
+       await limiter.close()
+
+All instances connect to the same Redis server and share state.
+
+High Availability Redis
+-----------------------
+
+For production, you'll want Redis with high availability:
+
+**Redis Sentinel:**
+
+.. code-block:: python
+
+   backend = await RedisBackend.from_url(
+       "redis://sentinel1:26379,sentinel2:26379,sentinel3:26379/0",
+       sentinel_master="mymaster",
+   )
+
+**Redis Cluster:**
+
+.. code-block:: python
+
+   backend = await RedisBackend.from_url(
+       "redis://node1:6379,node2:6379,node3:6379/0",
+   )
+
+Atomic Operations
+-----------------
+
+Race conditions are a real concern in distributed systems. Consider this scenario:
+
+1. Instance A reads: 99 requests made
+2. Instance B reads: 99 requests made
+3. Instance A writes: 100 requests (allows request)
+4. Instance B writes: 100 requests (allows request)
+
+Now you've allowed 101 requests when the limit was 100.
+
+FastAPI Traffic's Redis backend uses Lua scripts to make operations atomic:
+
+.. code-block:: lua
+
+   -- Simplified example of atomic check-and-increment
+   local current = redis.call('GET', KEYS[1])
+   if current and tonumber(current) >= limit then
+       return 0  -- Reject
+   end
+   redis.call('INCR', KEYS[1])
+   return 1  -- Allow
+
+The entire check-and-update happens in a single Redis operation.
+
+Network Latency
+---------------
+
+Redis adds network latency to every request. Some strategies to minimize impact:
+
+**1. Connection pooling (automatic):**
+
+The Redis backend maintains a connection pool, so you're not creating new
+connections for each request.
+
+**2. Local caching:**
+
+For very high-traffic endpoints, consider a two-tier approach:
+
+.. code-block:: python
+
+   from fastapi_traffic import MemoryBackend, RateLimiter
+
+   # Local memory backend for fast path
+   local_backend = MemoryBackend()
+   local_limiter = RateLimiter(local_backend)
+
+   # Redis backend for distributed state
+   redis_backend = await RedisBackend.from_url("redis://localhost:6379/0")
+   distributed_limiter = RateLimiter(redis_backend)
+
+   async def check_rate_limit(request: Request, config: RateLimitConfig):
+       # Quick local check (may allow some extra requests)
+       local_result = await local_limiter.check(request, config)
+       if not local_result.allowed:
+           return local_result
+       
+       # Authoritative distributed check
+       return await distributed_limiter.check(request, config)
+
+**3. Skip on error:**
+
+If Redis latency is causing issues, you might prefer to allow requests through
+rather than block:
+
+.. code-block:: python
+
+   @rate_limit(100, 60, skip_on_error=True)
+   async def endpoint(request: Request):
+       return {"status": "ok"}
+
+Handling Redis Failures
+-----------------------
+
+What happens when Redis goes down?
+
+**Fail closed (default):**
+
+Requests fail. This is safer but impacts availability.
+
+**Fail open:**
+
+Allow requests through:
+
+.. code-block:: python
+
+   @rate_limit(100, 60, skip_on_error=True)
+
+**Circuit breaker pattern:**
+
+Implement a circuit breaker to avoid hammering a failing Redis:
+
+.. code-block:: python
+
+   import time
+
+   class CircuitBreaker:
+       def __init__(self, failure_threshold=5, reset_timeout=60):
+           self.failures = 0
+           self.threshold = failure_threshold
+           self.reset_timeout = reset_timeout
+           self.last_failure = 0
+           self.open = False
+
+       def record_failure(self):
+           self.failures += 1
+           self.last_failure = time.time()
+           if self.failures >= self.threshold:
+               self.open = True
+
+       def record_success(self):
+           self.failures = 0
+           self.open = False
+
+       def should_allow(self) -> bool:
+           if not self.open:
+               return True
+           # Check if we should try again
+           if time.time() - self.last_failure > self.reset_timeout:
+               return True
+           return False
+
+Kubernetes Deployment
+---------------------
+
+Here's a typical Kubernetes setup:
+
+.. code-block:: yaml
+
+   # redis-deployment.yaml
+   apiVersion: apps/v1
+   kind: Deployment
+   metadata:
+     name: redis
+   spec:
+     replicas: 1
+     selector:
+       matchLabels:
+         app: redis
+     template:
+       metadata:
+         labels:
+           app: redis
+       spec:
+         containers:
+         - name: redis
+           image: redis:7-alpine
+           ports:
+           - containerPort: 6379
+   ---
+   apiVersion: v1
+   kind: Service
+   metadata:
+     name: redis
+   spec:
+     selector:
+       app: redis
+     ports:
+     - port: 6379
+
+.. code-block:: yaml
+
+   # app-deployment.yaml
+   apiVersion: apps/v1
+   kind: Deployment
+   metadata:
+     name: api
+   spec:
+     replicas: 3
+     selector:
+       matchLabels:
+         app: api
+     template:
+       spec:
+         containers:
+         - name: api
+           image: myapp:latest
+           env:
+           - name: REDIS_URL
+             value: "redis://redis:6379/0"
+
+Your app connects to Redis via the service name:
+
+.. code-block:: python
+
+   import os
+
+   redis_url = os.getenv("REDIS_URL", "redis://localhost:6379/0")
+   backend = await RedisBackend.from_url(redis_url)
+
+Monitoring
+----------
+
+Keep an eye on:
+
+1. **Redis latency:** High latency means slow requests
+2. **Redis memory:** Rate limit data shouldn't use much, but monitor it
+3. **Connection count:** Make sure you're not exhausting connections
+4. **Rate limit hits:** Track how often clients are being limited
+
+.. code-block:: python
+
+   import logging
+
+   logger = logging.getLogger(__name__)
+
+   def on_rate_limited(request: Request, result):
+       logger.info(
+           "Rate limited: client=%s path=%s remaining=%d",
+           request.client.host,
+           request.url.path,
+           result.info.remaining,
+       )
+
+   @rate_limit(100, 60, on_blocked=on_rate_limited)
+   async def endpoint(request: Request):
+       return {"status": "ok"}
+
+Testing Distributed Rate Limits
+-------------------------------
+
+Testing distributed behavior is tricky. Here's an approach:
+
+.. code-block:: python
+
+   import asyncio
+   import httpx
+
+   async def test_distributed_limit():
+       """Simulate requests from multiple 'instances'."""
+       async with httpx.AsyncClient() as client:
+           # Fire 150 requests concurrently
+           tasks = [
+               client.get("http://localhost:8000/api/data")
+               for _ in range(150)
+           ]
+           responses = await asyncio.gather(*tasks)
+           
+           # Count successes and rate limits
+           successes = sum(1 for r in responses if r.status_code == 200)
+           limited = sum(1 for r in responses if r.status_code == 429)
+           
+           print(f"Successes: {successes}, Rate limited: {limited}")
+           # With a limit of 100, expect ~100 successes and ~50 limited
+
+   asyncio.run(test_distributed_limit())
--- a/docs/advanced/performance.rst
+++ b/docs/advanced/performance.rst
@@ -0,0 +1,291 @@
+Performance
+===========
+
+FastAPI Traffic is designed to be fast. But when you're handling thousands of
+requests per second, every microsecond counts. Here's how to squeeze out the
+best performance.
+
+Baseline Performance
+--------------------
+
+On typical hardware, you can expect:
+
+- **Memory backend:** ~0.01ms per check
+- **SQLite backend:** ~0.1ms per check
+- **Redis backend:** ~1ms per check (network dependent)
+
+For most applications, this overhead is negligible compared to your actual
+business logic.
+
+Choosing the Right Algorithm
+----------------------------
+
+Algorithms have different performance characteristics:
+
+.. list-table::
+   :header-rows: 1
+
+   * - Algorithm
+     - Time Complexity
+     - Space Complexity
+     - Notes
+   * - Token Bucket
+     - O(1)
+     - O(1)
+     - Two floats per key
+   * - Fixed Window
+     - O(1)
+     - O(1)
+     - One int + one float per key
+   * - Sliding Window Counter
+     - O(1)
+     - O(1)
+     - Three values per key
+   * - Leaky Bucket
+     - O(1)
+     - O(1)
+     - Two floats per key
+   * - Sliding Window
+     - O(n)
+     - O(n)
+     - Stores every timestamp
+
+**Recommendation:** Use Sliding Window Counter (the default) unless you have
+specific requirements. It's O(1) and provides good accuracy.
+
+**Avoid Sliding Window for high-volume endpoints.** If you're allowing 10,000
+requests per minute, that's 10,000 timestamps to store and filter per key.
+
+Memory Backend Optimization
+---------------------------
+
+The memory backend is already fast, but you can tune it:
+
+.. code-block:: python
+
+   from fastapi_traffic import MemoryBackend
+
+   backend = MemoryBackend(
+       max_size=10000,       # Limit memory usage
+       cleanup_interval=60,  # Less frequent cleanup = less overhead
+   )
+
+**max_size:** Limits the number of keys stored. When exceeded, LRU eviction kicks
+in. Set this based on your expected number of unique clients.
+
+**cleanup_interval:** How often to scan for expired entries. Higher values mean
+less CPU overhead but more memory usage from expired entries.
+
+SQLite Backend Optimization
+---------------------------
+
+SQLite is surprisingly fast for rate limiting:
+
+.. code-block:: python
+
+   from fastapi_traffic import SQLiteBackend
+
+   backend = SQLiteBackend(
+       "rate_limits.db",
+       cleanup_interval=300,  # Clean every 5 minutes
+   )
+
+**Tips:**
+
+1. **Use an SSD.** SQLite performance depends heavily on disk I/O.
+
+2. **Put the database on a local disk.** Network-attached storage adds latency.
+
+3. **WAL mode is enabled by default.** This allows concurrent reads and writes.
+
+4. **Increase cleanup_interval** if you have many keys. Cleanup scans the entire
+   table.
+
+Redis Backend Optimization
+--------------------------
+
+Redis is the bottleneck in most distributed setups:
+
+**1. Use connection pooling (automatic):**
+
+The backend maintains a pool of connections. You don't need to do anything.
+
+**2. Use pipelining for batch operations:**
+
+If you're checking multiple rate limits, batch them:
+
+.. code-block:: python
+
+   # Instead of multiple round trips
+   result1 = await limiter.check(request, config1)
+   result2 = await limiter.check(request, config2)
+
+   # Consider combining into one check with higher cost
+   combined_config = RateLimitConfig(limit=100, window_size=60, cost=2)
+   result = await limiter.check(request, combined_config)
+
+**3. Use Redis close to your application:**
+
+Network latency is usually the biggest factor. Run Redis in the same datacenter,
+or better yet, the same availability zone.
+
+**4. Consider Redis Cluster for high throughput:**
+
+Distributes load across multiple Redis nodes.
+
+Reducing Overhead
+-----------------
+
+**1. Exempt paths that don't need limiting:**
+
+.. code-block:: python
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       exempt_paths={"/health", "/metrics", "/ready"},
+   )
+
+**2. Use coarse-grained limits when possible:**
+
+Instead of limiting every endpoint separately, use middleware for a global limit:
+
+.. code-block:: python
+
+   # One check per request
+   app.add_middleware(RateLimitMiddleware, limit=1000, window_size=60)
+
+   # vs. multiple checks per request
+   @rate_limit(100, 60)  # Check 1
+   @another_decorator     # Check 2
+   async def endpoint():
+       pass
+
+**3. Increase window size:**
+
+Longer windows mean fewer state updates:
+
+.. code-block:: python
+
+   # Updates state 60 times per minute per client
+   @rate_limit(60, 60)
+
+   # Updates state 1 time per minute per client
+   @rate_limit(1, 1)  # Same rate, but per-second
+
+Wait, that's backwards. Actually, the number of state updates equals the number
+of requests, regardless of window size. But longer windows mean:
+
+- Fewer unique window boundaries
+- Better cache efficiency
+- More stable rate limiting
+
+**4. Skip headers when not needed:**
+
+.. code-block:: python
+
+   @rate_limit(100, 60, include_headers=False)
+
+Saves a tiny bit of response processing.
+
+Benchmarking
+------------
+
+Here's a simple benchmark script:
+
+.. code-block:: python
+
+   import asyncio
+   import time
+   from fastapi_traffic import MemoryBackend, RateLimiter, RateLimitConfig
+   from unittest.mock import MagicMock
+
+   async def benchmark():
+       backend = MemoryBackend()
+       limiter = RateLimiter(backend)
+       await limiter.initialize()
+
+       config = RateLimitConfig(limit=10000, window_size=60)
+       
+       # Mock request
+       request = MagicMock()
+       request.client.host = "127.0.0.1"
+       request.url.path = "/test"
+       request.method = "GET"
+       request.headers = {}
+
+       # Warm up
+       for _ in range(100):
+           await limiter.check(request, config)
+
+       # Benchmark
+       iterations = 10000
+       start = time.perf_counter()
+       
+       for _ in range(iterations):
+           await limiter.check(request, config)
+       
+       elapsed = time.perf_counter() - start
+       
+       print(f"Total time: {elapsed:.3f}s")
+       print(f"Per check: {elapsed/iterations*1000:.3f}ms")
+       print(f"Checks/sec: {iterations/elapsed:.0f}")
+
+       await limiter.close()
+
+   asyncio.run(benchmark())
+
+Typical output:
+
+.. code-block:: text
+
+   Total time: 0.150s
+   Per check: 0.015ms
+   Checks/sec: 66666
+
+Profiling
+---------
+
+If you suspect rate limiting is a bottleneck, profile it:
+
+.. code-block:: python
+
+   import cProfile
+   import pstats
+
+   async def profile_rate_limiting():
+       # Your rate limiting code here
+       pass
+
+   cProfile.run('asyncio.run(profile_rate_limiting())', 'rate_limit.prof')
+
+   stats = pstats.Stats('rate_limit.prof')
+   stats.sort_stats('cumulative')
+   stats.print_stats(20)
+
+Look for:
+
+- Time spent in backend operations
+- Time spent in algorithm calculations
+- Unexpected hotspots
+
+When Performance Really Matters
+-------------------------------
+
+If you're handling millions of requests per second and rate limiting overhead
+is significant:
+
+1. **Consider sampling:** Only check rate limits for a percentage of requests
+   and extrapolate.
+
+2. **Use probabilistic data structures:** Bloom filters or Count-Min Sketch can
+   approximate rate limiting with less overhead.
+
+3. **Push to the edge:** Use CDN-level rate limiting (Cloudflare, AWS WAF) to
+   handle the bulk of traffic.
+
+4. **Accept some inaccuracy:** Fixed window with ``skip_on_error=True`` is very
+   fast and "good enough" for many use cases.
+
+For most applications, though, the default configuration is plenty fast.
--- a/docs/advanced/testing.rst
+++ b/docs/advanced/testing.rst
@@ -0,0 +1,367 @@
+Testing
+=======
+
+Testing rate-limited endpoints requires some care. You don't want your tests to
+be flaky because of timing issues, and you need to verify that limits actually work.
+
+Basic Testing Setup
+-------------------
+
+Use pytest with pytest-asyncio for async tests:
+
+.. code-block:: python
+
+   # conftest.py
+   import pytest
+   from fastapi.testclient import TestClient
+   from fastapi_traffic import MemoryBackend, RateLimiter
+   from fastapi_traffic.core.limiter import set_limiter
+
+   @pytest.fixture
+   def app():
+       """Create a fresh app for each test."""
+       from myapp import create_app
+       return create_app()
+
+   @pytest.fixture
+   def client(app):
+       """Test client with fresh rate limiter."""
+       backend = MemoryBackend()
+       limiter = RateLimiter(backend)
+       set_limiter(limiter)
+       
+       with TestClient(app) as client:
+           yield client
+
+Testing Rate Limit Enforcement
+------------------------------
+
+Verify that the limit is actually enforced:
+
+.. code-block:: python
+
+   def test_rate_limit_enforced(client):
+       """Test that requests are blocked after limit is reached."""
+       # Make requests up to the limit
+       for i in range(10):
+           response = client.get("/api/data")
+           assert response.status_code == 200, f"Request {i+1} should succeed"
+
+       # Next request should be rate limited
+       response = client.get("/api/data")
+       assert response.status_code == 429
+       assert "retry_after" in response.json()
+
+Testing Rate Limit Headers
+--------------------------
+
+Check that headers are included correctly:
+
+.. code-block:: python
+
+   def test_rate_limit_headers(client):
+       """Test that rate limit headers are present."""
+       response = client.get("/api/data")
+       
+       assert "X-RateLimit-Limit" in response.headers
+       assert "X-RateLimit-Remaining" in response.headers
+       assert "X-RateLimit-Reset" in response.headers
+       
+       # Verify values make sense
+       limit = int(response.headers["X-RateLimit-Limit"])
+       remaining = int(response.headers["X-RateLimit-Remaining"])
+       
+       assert limit == 100  # Your configured limit
+       assert remaining == 99  # One request made
+
+Testing Different Clients
+-------------------------
+
+Verify that different clients have separate limits:
+
+.. code-block:: python
+
+   def test_separate_limits_per_client(client):
+       """Test that different IPs have separate limits."""
+       # Client A makes requests
+       for _ in range(10):
+           response = client.get(
+               "/api/data",
+               headers={"X-Forwarded-For": "1.1.1.1"}
+           )
+           assert response.status_code == 200
+
+       # Client A is now limited
+       response = client.get(
+           "/api/data",
+           headers={"X-Forwarded-For": "1.1.1.1"}
+       )
+       assert response.status_code == 429
+
+       # Client B should still have full quota
+       response = client.get(
+           "/api/data",
+           headers={"X-Forwarded-For": "2.2.2.2"}
+       )
+       assert response.status_code == 200
+
+Testing Window Reset
+--------------------
+
+Test that limits reset after the window expires:
+
+.. code-block:: python
+
+   import time
+   from unittest.mock import patch
+
+   def test_limit_resets_after_window(client):
+       """Test that limits reset after window expires."""
+       # Exhaust the limit
+       for _ in range(10):
+           client.get("/api/data")
+
+       # Should be limited
+       response = client.get("/api/data")
+       assert response.status_code == 429
+
+       # Fast-forward time (mock time.time)
+       with patch('time.time') as mock_time:
+           # Move 61 seconds into the future
+           mock_time.return_value = time.time() + 61
+           
+           # Should be allowed again
+           response = client.get("/api/data")
+           assert response.status_code == 200
+
+Testing Exemptions
+------------------
+
+Verify that exemptions work:
+
+.. code-block:: python
+
+   def test_exempt_paths(client):
+       """Test that exempt paths bypass rate limiting."""
+       # Exhaust limit on a regular endpoint
+       for _ in range(100):
+           client.get("/api/data")
+
+       # Regular endpoint should be limited
+       response = client.get("/api/data")
+       assert response.status_code == 429
+
+       # Health check should still work
+       response = client.get("/health")
+       assert response.status_code == 200
+
+   def test_exempt_ips(client):
+       """Test that exempt IPs bypass rate limiting."""
+       # Make many requests from exempt IP
+       for _ in range(1000):
+           response = client.get(
+               "/api/data",
+               headers={"X-Forwarded-For": "127.0.0.1"}
+           )
+           assert response.status_code == 200  # Never limited
+
+Testing with Async Client
+-------------------------
+
+For async endpoints, use httpx:
+
+.. code-block:: python
+
+   import pytest
+   import httpx
+
+   @pytest.mark.asyncio
+   async def test_async_rate_limiting():
+       """Test rate limiting with async client."""
+       async with httpx.AsyncClient(app=app, base_url="http://test") as client:
+           # Make concurrent requests
+           responses = await asyncio.gather(*[
+               client.get("/api/data")
+               for _ in range(15)
+           ])
+
+           successes = sum(1 for r in responses if r.status_code == 200)
+           limited = sum(1 for r in responses if r.status_code == 429)
+
+           assert successes == 10  # Limit
+           assert limited == 5     # Over limit
+
+Testing Backend Failures
+------------------------
+
+Test behavior when the backend fails:
+
+.. code-block:: python
+
+   from unittest.mock import AsyncMock, patch
+   from fastapi_traffic import BackendError
+
+   def test_skip_on_error(client):
+       """Test that requests are allowed when backend fails and skip_on_error=True."""
+       with patch.object(
+           MemoryBackend, 'get',
+           side_effect=BackendError("Connection failed")
+       ):
+           # With skip_on_error=True, should still work
+           response = client.get("/api/data")
+           assert response.status_code == 200
+
+   def test_fail_on_error(client):
+       """Test that requests fail when backend fails and skip_on_error=False."""
+       with patch.object(
+           MemoryBackend, 'get',
+           side_effect=BackendError("Connection failed")
+       ):
+           # With skip_on_error=False (default), should fail
+           response = client.get("/api/strict-data")
+           assert response.status_code == 500
+
+Mocking the Rate Limiter
+------------------------
+
+For unit tests, you might want to mock the rate limiter entirely:
+
+.. code-block:: python
+
+   from unittest.mock import AsyncMock, MagicMock
+   from fastapi_traffic.core.limiter import set_limiter
+   from fastapi_traffic.core.models import RateLimitInfo, RateLimitResult
+
+   def test_with_mocked_limiter(client):
+       """Test endpoint logic without actual rate limiting."""
+       mock_limiter = MagicMock()
+       mock_limiter.hit = AsyncMock(return_value=RateLimitResult(
+           allowed=True,
+           info=RateLimitInfo(
+               limit=100,
+               remaining=99,
+               reset_at=time.time() + 60,
+               window_size=60,
+           ),
+           key="test",
+       ))
+       
+       set_limiter(mock_limiter)
+       
+       response = client.get("/api/data")
+       assert response.status_code == 200
+       mock_limiter.hit.assert_called_once()
+
+Integration Testing with Redis
+------------------------------
+
+For integration tests with Redis:
+
+.. code-block:: python
+
+   import pytest
+   from fastapi_traffic.backends.redis import RedisBackend
+
+   @pytest.fixture
+   async def redis_backend():
+       """Create a Redis backend for testing."""
+       backend = await RedisBackend.from_url(
+           "redis://localhost:6379/15",  # Use a test database
+           key_prefix="test:",
+       )
+       yield backend
+       await backend.clear()  # Clean up after test
+       await backend.close()
+
+   @pytest.mark.asyncio
+   async def test_redis_rate_limiting(redis_backend):
+       """Test rate limiting with real Redis."""
+       limiter = RateLimiter(redis_backend)
+       await limiter.initialize()
+
+       config = RateLimitConfig(limit=5, window_size=60)
+       request = create_mock_request("1.1.1.1")
+
+       # Make requests up to limit
+       for _ in range(5):
+           result = await limiter.check(request, config)
+           assert result.allowed
+
+       # Next should be blocked
+       result = await limiter.check(request, config)
+       assert not result.allowed
+
+       await limiter.close()
+
+Fixtures for Common Scenarios
+-----------------------------
+
+.. code-block:: python
+
+   # conftest.py
+   import pytest
+   from fastapi_traffic import MemoryBackend, RateLimiter, RateLimitConfig
+   from fastapi_traffic.core.limiter import set_limiter
+
+   @pytest.fixture
+   def fresh_limiter():
+       """Fresh rate limiter for each test."""
+       backend = MemoryBackend()
+       limiter = RateLimiter(backend)
+       set_limiter(limiter)
+       return limiter
+
+   @pytest.fixture
+   def rate_limit_config():
+       """Standard rate limit config for tests."""
+       return RateLimitConfig(
+           limit=10,
+           window_size=60,
+       )
+
+   @pytest.fixture
+   def mock_request():
+       """Create a mock request."""
+       def _create(ip="127.0.0.1", path="/test"):
+           request = MagicMock()
+           request.client.host = ip
+           request.url.path = path
+           request.method = "GET"
+           request.headers = {}
+           return request
+       return _create
+
+Avoiding Flaky Tests
+--------------------
+
+Rate limiting tests can be flaky due to timing. Tips:
+
+1. **Use short windows for tests:**
+
+   .. code-block:: python
+
+      @rate_limit(10, 1)  # 10 per second, not 10 per minute
+
+2. **Mock time instead of sleeping:**
+
+   .. code-block:: python
+
+      with patch('time.time', return_value=future_time):
+          # Test window reset
+
+3. **Reset state between tests:**
+
+   .. code-block:: python
+
+      @pytest.fixture(autouse=True)
+      async def reset_limiter():
+          yield
+          limiter = get_limiter()
+          await limiter.backend.clear()
+
+4. **Use unique keys per test:**
+
+   .. code-block:: python
+
+      def test_something(mock_request):
+          request = mock_request(ip=f"test-{uuid.uuid4()}")
--- a/docs/api/algorithms.rst
+++ b/docs/api/algorithms.rst
@@ -0,0 +1,211 @@
+Algorithms API
+==============
+
+Rate limiting algorithms and the factory function to create them.
+
+Algorithm Enum
+--------------
+
+.. py:class:: Algorithm
+
+   Enumeration of available rate limiting algorithms.
+
+   .. py:attribute:: TOKEN_BUCKET
+      :value: "token_bucket"
+
+      Token bucket algorithm. Allows bursts up to bucket capacity, then refills
+      at a steady rate.
+
+   .. py:attribute:: SLIDING_WINDOW
+      :value: "sliding_window"
+
+      Sliding window log algorithm. Tracks exact timestamps for precise limiting.
+      Higher memory usage.
+
+   .. py:attribute:: FIXED_WINDOW
+      :value: "fixed_window"
+
+      Fixed window algorithm. Simple time-based windows. Efficient but has
+      boundary issues.
+
+   .. py:attribute:: LEAKY_BUCKET
+      :value: "leaky_bucket"
+
+      Leaky bucket algorithm. Smooths out request rate for consistent throughput.
+
+   .. py:attribute:: SLIDING_WINDOW_COUNTER
+      :value: "sliding_window_counter"
+
+      Sliding window counter algorithm. Balances precision and efficiency.
+      This is the default.
+
+   **Usage:**
+
+   .. code-block:: python
+
+      from fastapi_traffic import Algorithm, rate_limit
+
+      @rate_limit(100, 60, algorithm=Algorithm.TOKEN_BUCKET)
+      async def endpoint(request: Request):
+          return {"status": "ok"}
+
+BaseAlgorithm
+-------------
+
+.. py:class:: BaseAlgorithm(limit, window_size, backend, *, burst_size=None)
+
+   Abstract base class for rate limiting algorithms.
+
+   :param limit: Maximum requests allowed in the window.
+   :type limit: int
+   :param window_size: Time window in seconds.
+   :type window_size: float
+   :param backend: Storage backend for rate limit state.
+   :type backend: Backend
+   :param burst_size: Maximum burst size. Defaults to limit.
+   :type burst_size: int | None
+
+   .. py:method:: check(key)
+      :async:
+
+      Check if a request is allowed and update state.
+
+      :param key: The rate limit key.
+      :type key: str
+      :returns: Tuple of (allowed, RateLimitInfo).
+      :rtype: tuple[bool, RateLimitInfo]
+
+   .. py:method:: reset(key)
+      :async:
+
+      Reset the rate limit state for a key.
+
+      :param key: The rate limit key.
+      :type key: str
+
+   .. py:method:: get_state(key)
+      :async:
+
+      Get current state without consuming a token.
+
+      :param key: The rate limit key.
+      :type key: str
+      :returns: Current rate limit info or None.
+      :rtype: RateLimitInfo | None
+
+TokenBucketAlgorithm
+--------------------
+
+.. py:class:: TokenBucketAlgorithm(limit, window_size, backend, *, burst_size=None)
+
+   Token bucket algorithm implementation.
+
+   Tokens are added to the bucket at a rate of ``limit / window_size`` per second.
+   Each request consumes one token. If no tokens are available, the request is
+   rejected.
+
+   The ``burst_size`` parameter controls the maximum bucket capacity, allowing
+   short bursts of traffic.
+
+   **State stored:**
+
+   - ``tokens``: Current number of tokens in the bucket
+   - ``last_update``: Timestamp of last update
+
+SlidingWindowAlgorithm
+----------------------
+
+.. py:class:: SlidingWindowAlgorithm(limit, window_size, backend, *, burst_size=None)
+
+   Sliding window log algorithm implementation.
+
+   Stores the timestamp of every request within the window. Provides the most
+   accurate rate limiting but uses more memory.
+
+   **State stored:**
+
+   - ``timestamps``: List of request timestamps within the window
+
+FixedWindowAlgorithm
+--------------------
+
+.. py:class:: FixedWindowAlgorithm(limit, window_size, backend, *, burst_size=None)
+
+   Fixed window algorithm implementation.
+
+   Divides time into fixed windows and counts requests in each window. Simple
+   and efficient, but allows up to 2x the limit at window boundaries.
+
+   **State stored:**
+
+   - ``count``: Number of requests in current window
+   - ``window_start``: Start timestamp of current window
+
+LeakyBucketAlgorithm
+--------------------
+
+.. py:class:: LeakyBucketAlgorithm(limit, window_size, backend, *, burst_size=None)
+
+   Leaky bucket algorithm implementation.
+
+   Requests fill a bucket that "leaks" at a constant rate. Smooths out traffic
+   for consistent throughput.
+
+   **State stored:**
+
+   - ``water_level``: Current water level in the bucket
+   - ``last_update``: Timestamp of last update
+
+SlidingWindowCounterAlgorithm
+-----------------------------
+
+.. py:class:: SlidingWindowCounterAlgorithm(limit, window_size, backend, *, burst_size=None)
+
+   Sliding window counter algorithm implementation.
+
+   Maintains counters for current and previous windows, calculating a weighted
+   average based on window progress. Balances precision and memory efficiency.
+
+   **State stored:**
+
+   - ``prev_count``: Count from previous window
+   - ``curr_count``: Count in current window
+   - ``current_window``: Start timestamp of current window
+
+get_algorithm
+-------------
+
+.. py:function:: get_algorithm(algorithm, limit, window_size, backend, *, burst_size=None)
+
+   Factory function to create algorithm instances.
+
+   :param algorithm: The algorithm type to create.
+   :type algorithm: Algorithm
+   :param limit: Maximum requests allowed.
+   :type limit: int
+   :param window_size: Time window in seconds.
+   :type window_size: float
+   :param backend: Storage backend.
+   :type backend: Backend
+   :param burst_size: Maximum burst size.
+   :type burst_size: int | None
+   :returns: An algorithm instance.
+   :rtype: BaseAlgorithm
+
+   **Usage:**
+
+   .. code-block:: python
+
+      from fastapi_traffic.core.algorithms import get_algorithm, Algorithm
+      from fastapi_traffic import MemoryBackend
+
+      backend = MemoryBackend()
+      algorithm = get_algorithm(
+          Algorithm.TOKEN_BUCKET,
+          limit=100,
+          window_size=60,
+          backend=backend,
+          burst_size=20,
+      )
+
+      allowed, info = await algorithm.check("user:123")
--- a/docs/api/backends.rst
+++ b/docs/api/backends.rst
@@ -0,0 +1,266 @@
+Backends API
+============
+
+Storage backends for rate limit state.
+
+Backend (Base Class)
+--------------------
+
+.. py:class:: Backend
+
+   Abstract base class for rate limit storage backends.
+
+   All backends must implement these methods:
+
+   .. py:method:: get(key)
+      :async:
+
+      Get the current state for a key.
+
+      :param key: The rate limit key.
+      :type key: str
+      :returns: The stored state dictionary or None if not found.
+      :rtype: dict[str, Any] | None
+
+   .. py:method:: set(key, value, *, ttl)
+      :async:
+
+      Set the state for a key with TTL.
+
+      :param key: The rate limit key.
+      :type key: str
+      :param value: The state dictionary to store.
+      :type value: dict[str, Any]
+      :param ttl: Time-to-live in seconds.
+      :type ttl: float
+
+   .. py:method:: delete(key)
+      :async:
+
+      Delete the state for a key.
+
+      :param key: The rate limit key.
+      :type key: str
+
+   .. py:method:: exists(key)
+      :async:
+
+      Check if a key exists.
+
+      :param key: The rate limit key.
+      :type key: str
+      :returns: True if the key exists.
+      :rtype: bool
+
+   .. py:method:: increment(key, amount=1)
+      :async:
+
+      Atomically increment a counter.
+
+      :param key: The rate limit key.
+      :type key: str
+      :param amount: The amount to increment by.
+      :type amount: int
+      :returns: The new value after incrementing.
+      :rtype: int
+
+   .. py:method:: clear()
+      :async:
+
+      Clear all rate limit data.
+
+   .. py:method:: close()
+      :async:
+
+      Close the backend connection.
+
+   Backends support async context manager protocol:
+
+   .. code-block:: python
+
+      async with MemoryBackend() as backend:
+          await backend.set("key", {"count": 1}, ttl=60)
+
+MemoryBackend
+-------------
+
+.. py:class:: MemoryBackend(max_size=10000, cleanup_interval=60)
+
+   In-memory storage backend with LRU eviction and TTL cleanup.
+
+   :param max_size: Maximum number of keys to store.
+   :type max_size: int
+   :param cleanup_interval: How often to clean expired entries (seconds).
+   :type cleanup_interval: float
+
+   **Usage:**
+
+   .. code-block:: python
+
+      from fastapi_traffic import MemoryBackend, RateLimiter
+
+      backend = MemoryBackend(max_size=10000)
+      limiter = RateLimiter(backend)
+
+   .. py:method:: get_stats()
+
+      Get statistics about the backend.
+
+      :returns: Dictionary with stats like key count, memory usage.
+      :rtype: dict[str, Any]
+
+   .. py:method:: start_cleanup()
+      :async:
+
+      Start the background cleanup task.
+
+   .. py:method:: stop_cleanup()
+      :async:
+
+      Stop the background cleanup task.
+
+SQLiteBackend
+-------------
+
+.. py:class:: SQLiteBackend(db_path, cleanup_interval=300)
+
+   SQLite storage backend for persistent rate limiting.
+
+   :param db_path: Path to the SQLite database file.
+   :type db_path: str | Path
+   :param cleanup_interval: How often to clean expired entries (seconds).
+   :type cleanup_interval: float
+
+   **Usage:**
+
+   .. code-block:: python
+
+      from fastapi_traffic import SQLiteBackend, RateLimiter
+
+      backend = SQLiteBackend("rate_limits.db")
+      limiter = RateLimiter(backend)
+
+      @app.on_event("startup")
+      async def startup():
+          await limiter.initialize()
+
+      @app.on_event("shutdown")
+      async def shutdown():
+          await limiter.close()
+
+   .. py:method:: initialize()
+      :async:
+
+      Initialize the database schema.
+
+   Features:
+
+   - WAL mode for better concurrent performance
+   - Automatic schema creation
+   - Connection pooling
+   - Background cleanup of expired entries
+
+RedisBackend
+------------
+
+.. py:class:: RedisBackend
+
+   Redis storage backend for distributed rate limiting.
+
+   .. py:method:: from_url(url, *, key_prefix="", **kwargs)
+      :classmethod:
+
+      Create a RedisBackend from a Redis URL. This is an async classmethod.
+
+      :param url: Redis connection URL.
+      :type url: str
+      :param key_prefix: Prefix for all keys.
+      :type key_prefix: str
+      :returns: Configured RedisBackend instance.
+      :rtype: RedisBackend
+
+   **Usage:**
+
+   .. code-block:: python
+
+      from fastapi_traffic.backends.redis import RedisBackend
+      from fastapi_traffic import RateLimiter
+
+      @app.on_event("startup")
+      async def startup():
+          backend = await RedisBackend.from_url("redis://localhost:6379/0")
+          limiter = RateLimiter(backend)
+
+   **Connection examples:**
+
+   .. code-block:: python
+
+      # Simple connection
+      backend = await RedisBackend.from_url("redis://localhost:6379/0")
+
+      # With password
+      backend = await RedisBackend.from_url("redis://:password@localhost:6379/0")
+
+      # With key prefix
+      backend = await RedisBackend.from_url(
+          "redis://localhost:6379/0",
+          key_prefix="myapp:ratelimit:",
+      )
+
+   .. py:method:: get_stats()
+      :async:
+
+      Get statistics about the Redis backend.
+
+      :returns: Dictionary with stats like key count, memory usage.
+      :rtype: dict[str, Any]
+
+   Features:
+
+   - Atomic operations via Lua scripts
+   - Automatic key expiration
+   - Connection pooling
+   - Support for Redis Sentinel and Cluster
+
+Implementing Custom Backends
+----------------------------
+
+To create a custom backend, inherit from ``Backend`` and implement all abstract
+methods:
+
+.. code-block:: python
+
+   from fastapi_traffic.backends.base import Backend
+   from typing import Any
+
+   class MyBackend(Backend):
+       async def get(self, key: str) -> dict[str, Any] | None:
+           # Retrieve state from your storage
+           pass
+
+       async def set(self, key: str, value: dict[str, Any], *, ttl: float) -> None:
+           # Store state with expiration
+           pass
+
+       async def delete(self, key: str) -> None:
+           # Remove a key
+           pass
+
+       async def exists(self, key: str) -> bool:
+           # Check if key exists
+           pass
+
+       async def increment(self, key: str, amount: int = 1) -> int:
+           # Atomically increment (important for accuracy)
+           pass
+
+       async def clear(self) -> None:
+           # Clear all data
+           pass
+
+       async def close(self) -> None:
+           # Clean up connections
+           pass
+
+The ``value`` dictionary contains algorithm-specific state. Your backend should
+serialize it appropriately (JSON works well for most cases).
--- a/docs/api/config.rst
+++ b/docs/api/config.rst
@@ -0,0 +1,245 @@
+Configuration API
+=================
+
+Configuration classes and loaders for rate limiting.
+
+RateLimitConfig
+---------------
+
+.. py:class:: RateLimitConfig(limit, window_size=60.0, algorithm=Algorithm.SLIDING_WINDOW_COUNTER, key_prefix="ratelimit", key_extractor=default_key_extractor, burst_size=None, include_headers=True, error_message="Rate limit exceeded", status_code=429, skip_on_error=False, cost=1, exempt_when=None, on_blocked=None)
+
+   Configuration for a rate limit rule.
+
+   :param limit: Maximum requests allowed in the window. Must be positive.
+   :type limit: int
+   :param window_size: Time window in seconds. Must be positive.
+   :type window_size: float
+   :param algorithm: Rate limiting algorithm to use.
+   :type algorithm: Algorithm
+   :param key_prefix: Prefix for the rate limit key.
+   :type key_prefix: str
+   :param key_extractor: Function to extract client identifier from request.
+   :type key_extractor: Callable[[Request], str]
+   :param burst_size: Maximum burst size for token/leaky bucket.
+   :type burst_size: int | None
+   :param include_headers: Whether to include rate limit headers.
+   :type include_headers: bool
+   :param error_message: Error message when rate limited.
+   :type error_message: str
+   :param status_code: HTTP status code when rate limited.
+   :type status_code: int
+   :param skip_on_error: Skip rate limiting on backend errors.
+   :type skip_on_error: bool
+   :param cost: Cost per request.
+   :type cost: int
+   :param exempt_when: Function to check if request is exempt.
+   :type exempt_when: Callable[[Request], bool] | None
+   :param on_blocked: Callback when request is blocked.
+   :type on_blocked: Callable[[Request, Any], Any] | None
+
+   **Usage:**
+
+   .. code-block:: python
+
+      from fastapi_traffic import RateLimitConfig, Algorithm
+
+      config = RateLimitConfig(
+          limit=100,
+          window_size=60,
+          algorithm=Algorithm.TOKEN_BUCKET,
+          burst_size=20,
+      )
+
+GlobalConfig
+------------
+
+.. py:class:: GlobalConfig(backend=None, enabled=True, default_limit=100, default_window_size=60.0, default_algorithm=Algorithm.SLIDING_WINDOW_COUNTER, key_prefix="fastapi_traffic", include_headers=True, error_message="Rate limit exceeded. Please try again later.", status_code=429, skip_on_error=False, exempt_ips=set(), exempt_paths=set(), headers_prefix="X-RateLimit")
+
+   Global configuration for the rate limiter.
+
+   :param backend: Storage backend for rate limit data.
+   :type backend: Backend | None
+   :param enabled: Whether rate limiting is enabled.
+   :type enabled: bool
+   :param default_limit: Default maximum requests per window.
+   :type default_limit: int
+   :param default_window_size: Default time window in seconds.
+   :type default_window_size: float
+   :param default_algorithm: Default rate limiting algorithm.
+   :type default_algorithm: Algorithm
+   :param key_prefix: Global prefix for all rate limit keys.
+   :type key_prefix: str
+   :param include_headers: Include rate limit headers by default.
+   :type include_headers: bool
+   :param error_message: Default error message.
+   :type error_message: str
+   :param status_code: Default HTTP status code.
+   :type status_code: int
+   :param skip_on_error: Skip rate limiting on backend errors.
+   :type skip_on_error: bool
+   :param exempt_ips: IP addresses exempt from rate limiting.
+   :type exempt_ips: set[str]
+   :param exempt_paths: URL paths exempt from rate limiting.
+   :type exempt_paths: set[str]
+   :param headers_prefix: Prefix for rate limit headers.
+   :type headers_prefix: str
+
+   **Usage:**
+
+   .. code-block:: python
+
+      from fastapi_traffic import GlobalConfig, RateLimiter
+
+      config = GlobalConfig(
+          enabled=True,
+          default_limit=100,
+          exempt_paths={"/health", "/docs"},
+          exempt_ips={"127.0.0.1"},
+      )
+
+      limiter = RateLimiter(config=config)
+
+ConfigLoader
+------------
+
+.. py:class:: ConfigLoader(prefix="FASTAPI_TRAFFIC")
+
+   Load rate limit configuration from various sources.
+
+   :param prefix: Environment variable prefix.
+   :type prefix: str
+
+   .. py:method:: load_rate_limit_config_from_env(env_vars=None, **overrides)
+
+      Load RateLimitConfig from environment variables.
+
+      :param env_vars: Dictionary of environment variables. Uses os.environ if None.
+      :type env_vars: dict[str, str] | None
+      :param overrides: Values to override after loading.
+      :returns: Loaded configuration.
+      :rtype: RateLimitConfig
+
+   .. py:method:: load_rate_limit_config_from_json(file_path, **overrides)
+
+      Load RateLimitConfig from a JSON file.
+
+      :param file_path: Path to the JSON file.
+      :type file_path: str | Path
+      :param overrides: Values to override after loading.
+      :returns: Loaded configuration.
+      :rtype: RateLimitConfig
+
+   .. py:method:: load_rate_limit_config_from_env_file(file_path, **overrides)
+
+      Load RateLimitConfig from a .env file.
+
+      :param file_path: Path to the .env file.
+      :type file_path: str | Path
+      :param overrides: Values to override after loading.
+      :returns: Loaded configuration.
+      :rtype: RateLimitConfig
+
+   .. py:method:: load_global_config_from_env(env_vars=None, **overrides)
+
+      Load GlobalConfig from environment variables.
+
+   .. py:method:: load_global_config_from_json(file_path, **overrides)
+
+      Load GlobalConfig from a JSON file.
+
+   .. py:method:: load_global_config_from_env_file(file_path, **overrides)
+
+      Load GlobalConfig from a .env file.
+
+   **Usage:**
+
+   .. code-block:: python
+
+      from fastapi_traffic import ConfigLoader
+
+      loader = ConfigLoader()
+
+      # From environment
+      config = loader.load_rate_limit_config_from_env()
+
+      # From JSON file
+      config = loader.load_rate_limit_config_from_json("config.json")
+
+      # From .env file
+      config = loader.load_rate_limit_config_from_env_file(".env")
+
+      # With overrides
+      config = loader.load_rate_limit_config_from_json(
+          "config.json",
+          limit=200,  # Override the limit
+      )
+
+Convenience Functions
+---------------------
+
+.. py:function:: load_rate_limit_config(file_path, **overrides)
+
+   Load RateLimitConfig with automatic format detection.
+
+   :param file_path: Path to config file (.json or .env).
+   :type file_path: str | Path
+   :returns: Loaded configuration.
+   :rtype: RateLimitConfig
+
+.. py:function:: load_rate_limit_config_from_env(**overrides)
+
+   Load RateLimitConfig from environment variables.
+
+   :returns: Loaded configuration.
+   :rtype: RateLimitConfig
+
+.. py:function:: load_global_config(file_path, **overrides)
+
+   Load GlobalConfig with automatic format detection.
+
+   :param file_path: Path to config file (.json or .env).
+   :type file_path: str | Path
+   :returns: Loaded configuration.
+   :rtype: GlobalConfig
+
+.. py:function:: load_global_config_from_env(**overrides)
+
+   Load GlobalConfig from environment variables.
+
+   :returns: Loaded configuration.
+   :rtype: GlobalConfig
+
+**Usage:**
+
+.. code-block:: python
+
+   from fastapi_traffic import (
+       load_rate_limit_config,
+       load_rate_limit_config_from_env,
+   )
+
+   # Auto-detect format
+   config = load_rate_limit_config("config.json")
+   config = load_rate_limit_config(".env")
+
+   # From environment
+   config = load_rate_limit_config_from_env()
+
+default_key_extractor
+---------------------
+
+.. py:function:: default_key_extractor(request)
+
+   Extract client IP as the default rate limit key.
+
+   Checks in order:
+
+   1. ``X-Forwarded-For`` header (first IP)
+   2. ``X-Real-IP`` header
+   3. Direct connection IP
+   4. Falls back to "unknown"
+
+   :param request: The incoming request.
+   :type request: Request
+   :returns: Client identifier string.
+   :rtype: str
--- a/docs/api/decorator.rst
+++ b/docs/api/decorator.rst
@@ -0,0 +1,154 @@
+Decorator API
+=============
+
+The ``@rate_limit`` decorator is the primary way to add rate limiting to your
+FastAPI endpoints.
+
+rate_limit
+----------
+
+.. py:function:: rate_limit(limit, window_size=60.0, *, algorithm=Algorithm.SLIDING_WINDOW_COUNTER, key_prefix="ratelimit", key_extractor=default_key_extractor, burst_size=None, include_headers=True, error_message="Rate limit exceeded", status_code=429, skip_on_error=False, cost=1, exempt_when=None, on_blocked=None)
+
+   Apply rate limiting to a FastAPI endpoint.
+
+   :param limit: Maximum number of requests allowed in the window.
+   :type limit: int
+   :param window_size: Time window in seconds. Defaults to 60.
+   :type window_size: float
+   :param algorithm: Rate limiting algorithm to use.
+   :type algorithm: Algorithm
+   :param key_prefix: Prefix for the rate limit key.
+   :type key_prefix: str
+   :param key_extractor: Function to extract client identifier from request.
+   :type key_extractor: Callable[[Request], str]
+   :param burst_size: Maximum burst size for token bucket/leaky bucket algorithms.
+   :type burst_size: int | None
+   :param include_headers: Whether to include rate limit headers in response.
+   :type include_headers: bool
+   :param error_message: Error message when rate limit is exceeded.
+   :type error_message: str
+   :param status_code: HTTP status code when rate limit is exceeded.
+   :type status_code: int
+   :param skip_on_error: Skip rate limiting if backend errors occur.
+   :type skip_on_error: bool
+   :param cost: Cost of each request (default 1).
+   :type cost: int
+   :param exempt_when: Function to determine if request should be exempt.
+   :type exempt_when: Callable[[Request], bool] | None
+   :param on_blocked: Callback when a request is blocked.
+   :type on_blocked: Callable[[Request, Any], Any] | None
+   :returns: Decorated function with rate limiting applied.
+   :rtype: Callable
+
+   **Basic usage:**
+
+   .. code-block:: python
+
+      from fastapi import FastAPI, Request
+      from fastapi_traffic import rate_limit
+
+      app = FastAPI()
+
+      @app.get("/api/data")
+      @rate_limit(100, 60)  # 100 requests per minute
+      async def get_data(request: Request):
+          return {"data": "here"}
+
+   **With algorithm:**
+
+   .. code-block:: python
+
+      from fastapi_traffic import rate_limit, Algorithm
+
+      @app.get("/api/burst")
+      @rate_limit(100, 60, algorithm=Algorithm.TOKEN_BUCKET, burst_size=20)
+      async def burst_endpoint(request: Request):
+          return {"status": "ok"}
+
+   **With custom key extractor:**
+
+   .. code-block:: python
+
+      def get_api_key(request: Request) -> str:
+          return request.headers.get("X-API-Key", "anonymous")
+
+      @app.get("/api/data")
+      @rate_limit(1000, 3600, key_extractor=get_api_key)
+      async def api_endpoint(request: Request):
+          return {"data": "here"}
+
+   **With exemption:**
+
+   .. code-block:: python
+
+      def is_admin(request: Request) -> bool:
+          return getattr(request.state, "is_admin", False)
+
+      @app.get("/api/admin")
+      @rate_limit(100, 60, exempt_when=is_admin)
+      async def admin_endpoint(request: Request):
+          return {"admin": "data"}
+
+RateLimitDependency
+-------------------
+
+.. py:class:: RateLimitDependency(limit, window_size=60.0, *, algorithm=Algorithm.SLIDING_WINDOW_COUNTER, key_prefix="ratelimit", key_extractor=default_key_extractor, burst_size=None, error_message="Rate limit exceeded", status_code=429, skip_on_error=False, cost=1, exempt_when=None)
+   :no-index:
+
+   FastAPI dependency for rate limiting. Returns rate limit info that can be
+   used in your endpoint. See :doc:`dependency` for full documentation.
+
+   :param limit: Maximum number of requests allowed in the window.
+   :type limit: int
+   :param window_size: Time window in seconds.
+   :type window_size: float
+
+   **Usage:**
+
+   .. code-block:: python
+
+      from fastapi import FastAPI, Depends, Request
+      from fastapi_traffic.core.decorator import RateLimitDependency
+
+      app = FastAPI()
+      rate_dep = RateLimitDependency(limit=100, window_size=60)
+
+      @app.get("/api/data")
+      async def get_data(request: Request, rate_info=Depends(rate_dep)):
+          return {
+              "data": "here",
+              "remaining_requests": rate_info.remaining,
+              "reset_at": rate_info.reset_at,
+          }
+
+   The dependency returns a ``RateLimitInfo`` object with:
+
+   - ``limit``: The configured limit
+   - ``remaining``: Remaining requests in the current window
+   - ``reset_at``: Unix timestamp when the window resets
+   - ``retry_after``: Seconds until retry (if rate limited)
+
+create_rate_limit_response
+--------------------------
+
+.. py:function:: create_rate_limit_response(exc, *, include_headers=True)
+
+   Create a standard rate limit response from a RateLimitExceeded exception.
+
+   :param exc: The RateLimitExceeded exception.
+   :type exc: RateLimitExceeded
+   :param include_headers: Whether to include rate limit headers.
+   :type include_headers: bool
+   :returns: A JSONResponse with rate limit information.
+   :rtype: Response
+
+   **Usage:**
+
+   .. code-block:: python
+
+      from fastapi_traffic import RateLimitExceeded
+      from fastapi_traffic.core.decorator import create_rate_limit_response
+
+      @app.exception_handler(RateLimitExceeded)
+      async def handler(request: Request, exc: RateLimitExceeded):
+          return create_rate_limit_response(exc)
--- a/docs/api/dependency.rst
+++ b/docs/api/dependency.rst
@@ -0,0 +1,473 @@
+Dependency Injection API
+========================
+
+If you're already using FastAPI's dependency injection system, you'll feel right
+at home with ``RateLimitDependency``. It plugs directly into ``Depends``, giving
+you rate limiting that works just like any other dependency—plus you get access
+to rate limit info right inside your endpoint.
+
+RateLimitDependency
+-------------------
+
+.. py:class:: RateLimitDependency(limit, window_size=60.0, *, algorithm=Algorithm.SLIDING_WINDOW_COUNTER, key_prefix="ratelimit", key_extractor=default_key_extractor, burst_size=None, error_message="Rate limit exceeded", status_code=429, skip_on_error=False, cost=1, exempt_when=None)
+
+   This is the main class you'll use for dependency-based rate limiting. Create
+   an instance, pass it to ``Depends()``, and you're done.
+
+   :param limit: Maximum number of requests allowed in the window.
+   :type limit: int
+   :param window_size: Time window in seconds. Defaults to 60.
+   :type window_size: float
+   :param algorithm: Rate limiting algorithm to use.
+   :type algorithm: Algorithm
+   :param key_prefix: Prefix for the rate limit key.
+   :type key_prefix: str
+   :param key_extractor: Function to extract client identifier from request.
+   :type key_extractor: Callable[[Request], str]
+   :param burst_size: Maximum burst size for token bucket/leaky bucket algorithms.
+   :type burst_size: int | None
+   :param error_message: Error message when rate limit is exceeded.
+   :type error_message: str
+   :param status_code: HTTP status code when rate limit is exceeded.
+   :type status_code: int
+   :param skip_on_error: Skip rate limiting if backend errors occur.
+   :type skip_on_error: bool
+   :param cost: Cost of each request (default 1).
+   :type cost: int
+   :param exempt_when: Function to determine if request should be exempt.
+   :type exempt_when: Callable[[Request], bool] | None
+
+   **Returns:** A ``RateLimitInfo`` object with details about the current rate limit state.
+
+RateLimitInfo
+-------------
+
+When the dependency runs, it hands you back a ``RateLimitInfo`` object. Here's
+what's inside:
+
+.. py:class:: RateLimitInfo
+
+   :param limit: The configured request limit.
+   :type limit: int
+   :param remaining: Remaining requests in the current window.
+   :type remaining: int
+   :param reset_at: Unix timestamp when the window resets.
+   :type reset_at: float
+   :param retry_after: Seconds until retry is allowed (if rate limited).
+   :type retry_after: float | None
+   :param window_size: The configured window size in seconds.
+   :type window_size: float
+
+   .. py:method:: to_headers() -> dict[str, str]
+
+      Converts the rate limit info into standard HTTP headers. Handy if you want
+      to add these headers to your response manually.
+
+      :returns: A dictionary with ``X-RateLimit-Limit``, ``X-RateLimit-Remaining``,
+                ``X-RateLimit-Reset``, and ``Retry-After`` (when applicable).
+
+Setup
+-----
+
+Before you can use the dependency, you need to set up the rate limiter. The
+cleanest way is with FastAPI's lifespan context manager:
+
+.. code-block:: python
+
+   from contextlib import asynccontextmanager
+   from fastapi import FastAPI
+   from fastapi_traffic import MemoryBackend, RateLimiter
+   from fastapi_traffic.core.limiter import set_limiter
+
+   backend = MemoryBackend()
+   limiter = RateLimiter(backend)
+
+   @asynccontextmanager
+   async def lifespan(app: FastAPI):
+       await limiter.initialize()
+       set_limiter(limiter)
+       yield
+       await limiter.close()
+
+   app = FastAPI(lifespan=lifespan)
+
+Basic Usage
+-----------
+
+Here's the simplest way to get started. Create a dependency instance and inject
+it with ``Depends``:
+
+.. code-block:: python
+
+   from fastapi import Depends, FastAPI, Request
+   from fastapi_traffic.core.decorator import RateLimitDependency
+
+   app = FastAPI()
+
+   # Create the rate limit dependency
+   rate_limit_dep = RateLimitDependency(limit=100, window_size=60)
+
+   @app.get("/api/data")
+   async def get_data(
+       request: Request,
+       rate_info=Depends(rate_limit_dep),
+   ):
+       return {
+           "data": "here",
+           "remaining_requests": rate_info.remaining,
+           "reset_at": rate_info.reset_at,
+       }
+
+Using Type Aliases
+------------------
+
+If you're using the same rate limit across multiple endpoints, type aliases
+with ``Annotated`` make your code much cleaner:
+
+.. code-block:: python
+
+   from typing import Annotated, TypeAlias
+   from fastapi import Depends, FastAPI, Request
+   from fastapi_traffic.core.decorator import RateLimitDependency
+   from fastapi_traffic.core.models import RateLimitInfo
+
+   app = FastAPI()
+
+   rate_limit_dep = RateLimitDependency(limit=100, window_size=60)
+
+   # Create a type alias for cleaner signatures
+   RateLimit: TypeAlias = Annotated[RateLimitInfo, Depends(rate_limit_dep)]
+
+   @app.get("/api/data")
+   async def get_data(request: Request, rate_info: RateLimit):
+       return {
+           "data": "here",
+           "remaining": rate_info.remaining,
+       }
+
+Tiered Rate Limits
+------------------
+
+This is where dependency injection really shines. You can apply different rate
+limits based on who's making the request—free users get 10 requests per minute,
+pro users get 100, and enterprise gets 1000:
+
+.. code-block:: python
+
+   from typing import Annotated, TypeAlias
+   from fastapi import Depends, FastAPI, Request
+   from fastapi_traffic.core.decorator import RateLimitDependency
+   from fastapi_traffic.core.models import RateLimitInfo
+
+   app = FastAPI()
+
+   # Define tier-specific limits
+   free_tier_limit = RateLimitDependency(
+       limit=10,
+       window_size=60,
+       key_prefix="free",
+   )
+
+   pro_tier_limit = RateLimitDependency(
+       limit=100,
+       window_size=60,
+       key_prefix="pro",
+   )
+
+   enterprise_tier_limit = RateLimitDependency(
+       limit=1000,
+       window_size=60,
+       key_prefix="enterprise",
+   )
+
+   def get_user_tier(request: Request) -> str:
+       """Get user tier from header (in real app, from JWT/database)."""
+       return request.headers.get("X-User-Tier", "free")
+
+   TierDep: TypeAlias = Annotated[str, Depends(get_user_tier)]
+
+   async def tiered_rate_limit(
+       request: Request,
+       tier: TierDep,
+   ) -> RateLimitInfo:
+       """Apply different rate limits based on user tier."""
+       if tier == "enterprise":
+           return await enterprise_tier_limit(request)
+       elif tier == "pro":
+           return await pro_tier_limit(request)
+       else:
+           return await free_tier_limit(request)
+
+   TieredRateLimit: TypeAlias = Annotated[RateLimitInfo, Depends(tiered_rate_limit)]
+
+   @app.get("/api/resource")
+   async def get_resource(request: Request, rate_info: TieredRateLimit):
+       tier = get_user_tier(request)
+       return {
+           "tier": tier,
+           "remaining": rate_info.remaining,
+           "limit": rate_info.limit,
+       }
+
+Custom Key Extraction
+---------------------
+
+By default, rate limits are tracked by IP address. But what if you want to rate
+limit by API key instead? Just pass a custom ``key_extractor``:
+
+.. code-block:: python
+
+   from fastapi import Depends, FastAPI, Request
+   from fastapi_traffic.core.decorator import RateLimitDependency
+
+   app = FastAPI()
+
+   def api_key_extractor(request: Request) -> str:
+       """Extract API key for rate limiting."""
+       api_key = request.headers.get("X-API-Key", "anonymous")
+       return f"api:{api_key}"
+
+   api_rate_limit = RateLimitDependency(
+       limit=100,
+       window_size=3600,  # 100 requests per hour
+       key_extractor=api_key_extractor,
+   )
+
+   @app.get("/api/resource")
+   async def api_resource(
+       request: Request,
+       rate_info=Depends(api_rate_limit),
+   ):
+       return {
+           "data": "Resource data",
+           "requests_remaining": rate_info.remaining,
+       }
+
+Multiple Rate Limits
+--------------------
+
+Sometimes you need layered protection—say, 10 requests per minute *and* 100
+requests per hour. Dependencies make this easy to compose:
+
+.. code-block:: python
+
+   from typing import Annotated, Any, TypeAlias
+   from fastapi import Depends, FastAPI, Request
+   from fastapi_traffic.core.decorator import RateLimitDependency
+   from fastapi_traffic.core.models import RateLimitInfo
+
+   app = FastAPI()
+
+   per_minute_limit = RateLimitDependency(
+       limit=10,
+       window_size=60,
+       key_prefix="minute",
+   )
+
+   per_hour_limit = RateLimitDependency(
+       limit=100,
+       window_size=3600,
+       key_prefix="hour",
+   )
+
+   PerMinuteLimit: TypeAlias = Annotated[RateLimitInfo, Depends(per_minute_limit)]
+   PerHourLimit: TypeAlias = Annotated[RateLimitInfo, Depends(per_hour_limit)]
+
+   async def combined_rate_limit(
+       request: Request,
+       minute_info: PerMinuteLimit,
+       hour_info: PerHourLimit,
+   ) -> dict[str, Any]:
+       """Apply both per-minute and per-hour limits."""
+       return {
+           "minute": {
+               "limit": minute_info.limit,
+               "remaining": minute_info.remaining,
+           },
+           "hour": {
+               "limit": hour_info.limit,
+               "remaining": hour_info.remaining,
+           },
+       }
+
+   CombinedRateLimit: TypeAlias = Annotated[dict[str, Any], Depends(combined_rate_limit)]
+
+   @app.get("/api/combined")
+   async def combined_endpoint(
+       request: Request,
+       rate_info: CombinedRateLimit,
+   ):
+       return {
+           "message": "Success",
+           "rate_limits": rate_info,
+       }
+
+Exemption Logic
+---------------
+
+Need to let certain requests bypass rate limiting entirely? Maybe internal
+services or admin users? Use the ``exempt_when`` parameter:
+
+.. code-block:: python
+
+   from fastapi import Depends, FastAPI, Request
+   from fastapi_traffic.core.decorator import RateLimitDependency
+
+   app = FastAPI()
+
+   def is_internal_request(request: Request) -> bool:
+       """Check if request is from internal service."""
+       internal_token = request.headers.get("X-Internal-Token")
+       return internal_token == "internal-secret-token"
+
+   internal_exempt_limit = RateLimitDependency(
+       limit=10,
+       window_size=60,
+       exempt_when=is_internal_request,
+   )
+
+   @app.get("/api/internal")
+   async def internal_endpoint(
+       request: Request,
+       rate_info=Depends(internal_exempt_limit),
+   ):
+       is_internal = is_internal_request(request)
+       return {
+           "message": "Success",
+           "is_internal": is_internal,
+           "rate_limit": None if is_internal else {
+               "remaining": rate_info.remaining,
+           },
+       }
+
+Exception Handling
+------------------
+
+When a request exceeds the rate limit, a ``RateLimitExceeded`` exception is
+raised. You'll want to catch this and return a proper response:
+
+.. code-block:: python
+
+   from fastapi import FastAPI, Request
+   from fastapi.responses import JSONResponse
+   from fastapi_traffic import RateLimitExceeded
+
+   app = FastAPI()
+
+   @app.exception_handler(RateLimitExceeded)
+   async def rate_limit_handler(
+       request: Request,
+       exc: RateLimitExceeded,
+   ) -> JSONResponse:
+       return JSONResponse(
+           status_code=429,
+           content={
+               "error": "rate_limit_exceeded",
+               "message": exc.message,
+               "retry_after": exc.retry_after,
+           },
+       )
+
+Or if you prefer, there's a built-in helper that does the work for you:
+
+.. code-block:: python
+
+   from fastapi import FastAPI, Request
+   from fastapi_traffic import RateLimitExceeded
+   from fastapi_traffic.core.decorator import create_rate_limit_response
+
+   app = FastAPI()
+
+   @app.exception_handler(RateLimitExceeded)
+   async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
+       return create_rate_limit_response(exc, include_headers=True)
+
+Complete Example
+----------------
+
+Here's everything put together in a working example you can copy and run:
+
+.. code-block:: python
+
+   from contextlib import asynccontextmanager
+   from typing import Annotated, TypeAlias
+
+   from fastapi import Depends, FastAPI, Request
+   from fastapi.responses import JSONResponse
+
+   from fastapi_traffic import (
+       MemoryBackend,
+       RateLimiter,
+       RateLimitExceeded,
+   )
+   from fastapi_traffic.core.decorator import RateLimitDependency
+   from fastapi_traffic.core.limiter import set_limiter
+   from fastapi_traffic.core.models import RateLimitInfo
+
+   # Initialize backend and limiter
+   backend = MemoryBackend()
+   limiter = RateLimiter(backend)
+
+   @asynccontextmanager
+   async def lifespan(app: FastAPI):
+       await limiter.initialize()
+       set_limiter(limiter)
+       yield
+       await limiter.close()
+
+   app = FastAPI(lifespan=lifespan)
+
+   # Exception handler
+   @app.exception_handler(RateLimitExceeded)
+   async def rate_limit_handler(
+       request: Request,
+       exc: RateLimitExceeded,
+   ) -> JSONResponse:
+       return JSONResponse(
+           status_code=429,
+           content={
+               "error": "rate_limit_exceeded",
+               "retry_after": exc.retry_after,
+           },
+       )
+
+   # Create dependency
+   api_rate_limit = RateLimitDependency(limit=100, window_size=60)
+   ApiRateLimit: TypeAlias = Annotated[RateLimitInfo, Depends(api_rate_limit)]
+
+   @app.get("/api/data")
+   async def get_data(request: Request, rate_info: ApiRateLimit):
+       return {
+           "data": "Your data here",
+           "rate_limit": {
+               "limit": rate_info.limit,
+               "remaining": rate_info.remaining,
+               "reset_at": rate_info.reset_at,
+           },
+       }
+
+Decorator vs Dependency
+-----------------------
+
+Not sure which approach to use? Here's a quick guide:
+
+**Go with the ``@rate_limit`` decorator if:**
+
+- You just want to slap a rate limit on an endpoint and move on
+- You don't care about the remaining request count inside your endpoint
+- You're applying the same limit to a bunch of endpoints
+
+**Go with ``RateLimitDependency`` if:**
+
+- You want to show users how many requests they have left
+- You need different limits for different user tiers
+- You're stacking multiple rate limits (per-minute + per-hour)
+- You're already using FastAPI's dependency system and want consistency
+
+See Also
+--------
+
+- :doc:`decorator` - Decorator-based rate limiting
+- :doc:`middleware` - Global middleware rate limiting
+- :doc:`config` - Configuration options
+- :doc:`exceptions` - Exception handling
--- a/docs/api/exceptions.rst
+++ b/docs/api/exceptions.rst
@@ -0,0 +1,165 @@
+Exceptions API
+==============
+
+Custom exceptions raised by FastAPI Traffic.
+
+FastAPITrafficError
+-------------------
+
+.. py:exception:: FastAPITrafficError
+
+   Base exception for all FastAPI Traffic errors.
+
+   All other exceptions in this library inherit from this class, so you can
+   catch all FastAPI Traffic errors with a single handler:
+
+   .. code-block:: python
+
+      from fastapi_traffic.exceptions import FastAPITrafficError
+
+      @app.exception_handler(FastAPITrafficError)
+      async def handle_traffic_error(request: Request, exc: FastAPITrafficError):
+          return JSONResponse(
+              status_code=500,
+              content={"error": str(exc)},
+          )
+
+RateLimitExceeded
+-----------------
+
+.. py:exception:: RateLimitExceeded(message="Rate limit exceeded", *, retry_after=None, limit_info=None)
+
+   Raised when a rate limit has been exceeded.
+
+   :param message: Error message.
+   :type message: str
+   :param retry_after: Seconds until the client can retry.
+   :type retry_after: float | None
+   :param limit_info: Detailed rate limit information.
+   :type limit_info: RateLimitInfo | None
+
+   .. py:attribute:: message
+      :type: str
+
+      The error message.
+
+   .. py:attribute:: retry_after
+      :type: float | None
+
+      Seconds until the client can retry. May be None if not calculable.
+
+   .. py:attribute:: limit_info
+      :type: RateLimitInfo | None
+
+      Detailed information about the rate limit state.
+
+   **Usage:**
+
+   .. code-block:: python
+
+      from fastapi import Request
+      from fastapi.responses import JSONResponse
+      from fastapi_traffic import RateLimitExceeded
+
+      @app.exception_handler(RateLimitExceeded)
+      async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
+          headers = {}
+          if exc.limit_info:
+              headers = exc.limit_info.to_headers()
+
+          return JSONResponse(
+              status_code=429,
+              content={
+                  "error": "rate_limit_exceeded",
+                  "message": exc.message,
+                  "retry_after": exc.retry_after,
+              },
+              headers=headers,
+          )
+
+BackendError
+------------
+
+.. py:exception:: BackendError(message="Backend operation failed", *, original_error=None)
+
+   Raised when a backend operation fails.
+
+   :param message: Error message.
+   :type message: str
+   :param original_error: The original exception that caused this error.
+   :type original_error: Exception | None
+
+   .. py:attribute:: message
+      :type: str
+
+      The error message.
+
+   .. py:attribute:: original_error
+      :type: Exception | None
+
+      The underlying exception, if any.
+
+   **Usage:**
+
+   .. code-block:: python
+
+      from fastapi_traffic import BackendError
+
+      @app.exception_handler(BackendError)
+      async def backend_error_handler(request: Request, exc: BackendError):
+          # Log the original error for debugging
+          if exc.original_error:
+              logger.error("Backend error: %s", exc.original_error)
+
+          return JSONResponse(
+              status_code=503,
+              content={"error": "service_unavailable"},
+          )
+
+   This exception is raised when:
+
+   - Redis connection fails
+   - SQLite database is locked or corrupted
+   - Any other backend storage operation fails
+
+ConfigurationError
+------------------
+
+.. py:exception:: ConfigurationError
+
+   Raised when there is a configuration error.
+
+   This exception is raised when:
+
+   - Invalid values in configuration files
+   - Missing required configuration
+   - Type conversion failures
+   - Unknown configuration fields
+
+   **Usage:**
+
+   .. code-block:: python
+
+      from fastapi_traffic import ConfigLoader, ConfigurationError
+
+      loader = ConfigLoader()
+
+      try:
+          config = loader.load_rate_limit_config_from_json("config.json")
+      except ConfigurationError as e:
+          print(f"Configuration error: {e}")
+          # Use default configuration
+          config = RateLimitConfig(limit=100, window_size=60)
+
+Exception Hierarchy
+-------------------
+
+.. code-block:: text
+
+   FastAPITrafficError
+   ├── RateLimitExceeded
+   ├── BackendError
+   └── ConfigurationError
+
+All exceptions inherit from ``FastAPITrafficError``, which inherits from
+Python's built-in ``Exception``.
--- a/docs/api/middleware.rst
+++ b/docs/api/middleware.rst
@@ -0,0 +1,118 @@
+Middleware API
+==============
+
+Middleware for applying rate limiting globally across your application.
+
+RateLimitMiddleware
+-------------------
+
+.. py:class:: RateLimitMiddleware(app, *, limit=100, window_size=60.0, algorithm=Algorithm.SLIDING_WINDOW_COUNTER, backend=None, key_prefix="middleware", include_headers=True, error_message="Rate limit exceeded. Please try again later.", status_code=429, skip_on_error=False, exempt_paths=None, exempt_ips=None, key_extractor=default_key_extractor)
+
+   Middleware for global rate limiting across all endpoints.
+
+   :param app: The ASGI application.
+   :type app: ASGIApp
+   :param limit: Maximum requests per window.
+   :type limit: int
+   :param window_size: Time window in seconds.
+   :type window_size: float
+   :param algorithm: Rate limiting algorithm.
+   :type algorithm: Algorithm
+   :param backend: Storage backend. Defaults to MemoryBackend.
+   :type backend: Backend | None
+   :param key_prefix: Prefix for rate limit keys.
+   :type key_prefix: str
+   :param include_headers: Include rate limit headers in response.
+   :type include_headers: bool
+   :param error_message: Error message when rate limited.
+   :type error_message: str
+   :param status_code: HTTP status code when rate limited.
+   :type status_code: int
+   :param skip_on_error: Skip rate limiting on backend errors.
+   :type skip_on_error: bool
+   :param exempt_paths: Paths to exempt from rate limiting.
+   :type exempt_paths: set[str] | None
+   :param exempt_ips: IP addresses to exempt from rate limiting.
+   :type exempt_ips: set[str] | None
+   :param key_extractor: Function to extract client identifier.
+   :type key_extractor: Callable[[Request], str]
+
+   **Basic usage:**
+
+   .. code-block:: python
+
+      from fastapi import FastAPI
+      from fastapi_traffic.middleware import RateLimitMiddleware
+
+      app = FastAPI()
+
+      app.add_middleware(
+          RateLimitMiddleware,
+          limit=1000,
+          window_size=60,
+      )
+
+   **With exemptions:**
+
+   .. code-block:: python
+
+      app.add_middleware(
+          RateLimitMiddleware,
+          limit=1000,
+          window_size=60,
+          exempt_paths={"/health", "/docs"},
+          exempt_ips={"127.0.0.1"},
+      )
+
+   **With custom backend:**
+
+   .. code-block:: python
+
+      from fastapi_traffic import SQLiteBackend
+
+      backend = SQLiteBackend("rate_limits.db")
+
+      app.add_middleware(
+          RateLimitMiddleware,
+          limit=1000,
+          window_size=60,
+          backend=backend,
+      )
+
+SlidingWindowMiddleware
+-----------------------
+
+.. py:class:: SlidingWindowMiddleware(app, *, limit=100, window_size=60.0, **kwargs)
+
+   Convenience middleware using the sliding window algorithm.
+
+   Accepts all the same parameters as ``RateLimitMiddleware``.
+
+   .. code-block:: python
+
+      from fastapi_traffic.middleware import SlidingWindowMiddleware
+
+      app.add_middleware(
+          SlidingWindowMiddleware,
+          limit=1000,
+          window_size=60,
+      )
+
+TokenBucketMiddleware
+---------------------
+
+.. py:class:: TokenBucketMiddleware(app, *, limit=100, window_size=60.0, **kwargs)
+
+   Convenience middleware using the token bucket algorithm.
+
+   Accepts all the same parameters as ``RateLimitMiddleware``.
+
+   .. code-block:: python
+
+      from fastapi_traffic.middleware import TokenBucketMiddleware
+
+      app.add_middleware(
+          TokenBucketMiddleware,
+          limit=1000,
+          window_size=60,
+      )
--- a/docs/changelog.rst
+++ b/docs/changelog.rst
@@ -0,0 +1,91 @@
+Changelog
+=========
+
+All notable changes to FastAPI Traffic are documented here.
+
+The format is based on `Keep a Changelog <https://keepachangelog.com/en/1.1.0/>`_,
+and this project adheres to `Semantic Versioning <https://semver.org/spec/v2.0.0.html>`_.
+
+[0.2.1] - 2026-03-07
+--------------------
+
+Changed
+^^^^^^^
+
+- Improved config loader validation using Pydantic schemas
+- Added pydantic>=2.0 as a core dependency
+- Fixed sync wrapper in decorator to properly handle rate limiting
+- Updated pyright settings for stricter type checking
+- Fixed repository URL in pyproject.toml
+
+Removed
+^^^^^^^
+
+- Removed unused main.py
+
+[0.2.0] - 2026-02-04
+--------------------
+
+Added
+^^^^^
+
+- **Configuration Loader** — Load rate limiting configuration from external files:
+
+  - ``ConfigLoader`` class for loading ``RateLimitConfig`` and ``GlobalConfig``
+  - Support for ``.env`` files with ``FASTAPI_TRAFFIC_*`` prefixed variables
+  - Support for JSON configuration files
+  - Environment variable loading with ``load_rate_limit_config_from_env()`` and ``load_global_config_from_env()``
+  - Auto-detection of file format with ``load_rate_limit_config()`` and ``load_global_config()``
+  - Custom environment variable prefix support
+  - Type validation and comprehensive error handling
+  - 47 new tests for configuration loading
+
+- Example ``11_config_loader.py`` demonstrating all configuration loading patterns
+- ``get_stats()`` method to ``MemoryBackend`` for consistency with ``RedisBackend``
+- Comprehensive test suite with 134 tests covering:
+
+  - All five rate limiting algorithms with timing and concurrency tests
+  - Backend tests for Memory and SQLite with edge cases
+  - Decorator and middleware integration tests
+  - Exception handling and configuration validation
+  - End-to-end integration tests with FastAPI apps
+
+- ``httpx`` and ``pytest-asyncio`` as dev dependencies for testing
+
+Changed
+^^^^^^^
+
+- Improved documentation in README.md and DEVELOPMENT.md
+- Added ``asyncio_default_fixture_loop_scope`` config for pytest-asyncio compatibility
+
+[0.1.0] - 2025-01-09
+--------------------
+
+Initial release.
+
+Added
+^^^^^
+
+- Core rate limiting with ``@rate_limit`` decorator
+- Five algorithms:
+
+  - Token Bucket
+  - Sliding Window
+  - Fixed Window
+  - Leaky Bucket
+  - Sliding Window Counter
+
+- Three storage backends:
+
+  - Memory (default) — In-memory with LRU eviction
+  - SQLite — Persistent storage with WAL mode
+  - Redis — Distributed storage with Lua scripts
+
+- Middleware support for global rate limiting via ``RateLimitMiddleware``
+- Dependency injection support with ``RateLimitDependency``
+- Custom key extractors for flexible rate limit grouping (by IP, API key, user, etc.)
+- Configurable exemptions with ``exempt_when`` callback
+- Rate limit headers (``X-RateLimit-Limit``, ``X-RateLimit-Remaining``, ``X-RateLimit-Reset``)
+- ``RateLimitExceeded`` exception with ``retry_after`` and ``limit_info``
+- Full async support throughout
+- Strict type hints (pyright/mypy compatible)
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -0,0 +1,103 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# For the full list of built-in configuration values, see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+import sys
+from pathlib import Path
+
+# Add the project root to the path so autodoc can find the modules
+sys.path.insert(0, str(Path(__file__).parent.parent.resolve()))
+
+# -- Project information -----------------------------------------------------
+project = "fastapi-traffic"
+copyright = "2026, zanewalker"
+author = "zanewalker"
+release = "0.2.1"
+version = "0.2.1"
+
+# -- General configuration ---------------------------------------------------
+extensions = [
+    "sphinx.ext.autodoc",
+    "sphinx.ext.napoleon",
+    "sphinx.ext.viewcode",
+    "sphinx.ext.intersphinx",
+    "sphinx.ext.autosummary",
+    "sphinx_copybutton",
+    "sphinx_design",
+    "myst_parser",
+]
+
+templates_path = ["_templates"]
+exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
+
+# The suffix(es) of source filenames.
+source_suffix = {
+    ".rst": "restructuredtext",
+    ".md": "markdown",
+}
+
+# The master toctree document.
+master_doc = "index"
+
+# -- Options for HTML output -------------------------------------------------
+html_theme = "furo"
+html_title = "fastapi-traffic"
+html_static_path = ["_static"]
+
+html_theme_options = {
+    "light_css_variables": {
+        "color-brand-primary": "#009485",
+        "color-brand-content": "#009485",
+    },
+    "dark_css_variables": {
+        "color-brand-primary": "#00d4aa",
+        "color-brand-content": "#00d4aa",
+    },
+    "sidebar_hide_name": False,
+    "navigation_with_keys": True,
+}
+
+# -- Options for autodoc -----------------------------------------------------
+autodoc_default_options = {
+    "members": True,
+    "member-order": "bysource",
+    "special-members": "__init__",
+    "undoc-members": True,
+    "exclude-members": "__weakref__",
+}
+
+autodoc_typehints = "description"
+autodoc_class_signature = "separated"
+
+# -- Options for Napoleon (Google/NumPy docstrings) --------------------------
+napoleon_google_docstring = True
+napoleon_numpy_docstring = True
+napoleon_include_init_with_doc = True
+napoleon_include_private_with_doc = False
+napoleon_include_special_with_doc = True
+napoleon_use_admonition_for_examples = True
+napoleon_use_admonition_for_notes = True
+napoleon_use_admonition_for_references = True
+napoleon_use_ivar = False
+napoleon_use_param = True
+napoleon_use_rtype = True
+napoleon_preprocess_types = False
+napoleon_type_aliases = None
+napoleon_attr_annotations = True
+
+# -- Options for intersphinx -------------------------------------------------
+intersphinx_mapping = {
+    "python": ("https://docs.python.org/3", None),
+    "starlette": ("https://www.starlette.io", None),
+    "fastapi": ("https://fastapi.tiangolo.com", None),
+}
+
+# -- MyST Parser options -----------------------------------------------------
+myst_enable_extensions = [
+    "colon_fence",
+    "deflist",
+    "fieldlist",
+    "tasklist",
+]
+myst_heading_anchors = 3
--- a/docs/contributing.rst
+++ b/docs/contributing.rst
@@ -0,0 +1,204 @@
+Contributing
+============
+
+Thanks for your interest in contributing to FastAPI Traffic! This guide will help
+you get started.
+
+Development Setup
+-----------------
+
+1. **Clone the repository:**
+
+   .. code-block:: bash
+
+      git clone https://gitlab.com/zanewalker/fastapi-traffic.git
+      cd fastapi-traffic
+
+2. **Install uv** (if you don't have it):
+
+   .. code-block:: bash
+
+      curl -LsSf https://astral.sh/uv/install.sh | sh
+
+3. **Create a virtual environment and install dependencies:**
+
+   .. code-block:: bash
+
+      uv venv
+      source .venv/bin/activate  # or .venv\Scripts\activate on Windows
+      uv pip install -e ".[dev]"
+
+4. **Verify everything works:**
+
+   .. code-block:: bash
+
+      pytest
+
+Running Tests
+-------------
+
+Run the full test suite:
+
+.. code-block:: bash
+
+   pytest
+
+Run with coverage:
+
+.. code-block:: bash
+
+   pytest --cov=fastapi_traffic --cov-report=html
+
+Run specific tests:
+
+.. code-block:: bash
+
+   pytest tests/test_algorithms.py
+   pytest -k "test_token_bucket"
+
+Code Style
+----------
+
+We use ruff for linting and formatting:
+
+.. code-block:: bash
+
+   # Check for issues
+   ruff check .
+
+   # Auto-fix issues
+   ruff check --fix .
+
+   # Format code
+   ruff format .
+
+Type Checking
+-------------
+
+We use pyright for type checking:
+
+.. code-block:: bash
+
+   pyright
+
+The codebase is strictly typed. All public APIs should have complete type hints.
+
+Making Changes
+--------------
+
+1. **Create a branch:**
+
+   .. code-block:: bash
+
+      git checkout -b feature/my-feature
+
+2. **Make your changes.** Follow the existing code style.
+
+3. **Add tests.** All new features should have tests.
+
+4. **Run the checks:**
+
+   .. code-block:: bash
+
+      ruff check .
+      ruff format .
+      pyright
+      pytest
+
+5. **Commit your changes:**
+
+   .. code-block:: bash
+
+      git commit -m "feat: add my feature"
+
+   We follow `Conventional Commits <https://www.conventionalcommits.org/>`_:
+
+   - ``feat:`` New features
+   - ``fix:`` Bug fixes
+   - ``docs:`` Documentation changes
+   - ``style:`` Code style changes (formatting, etc.)
+   - ``refactor:`` Code refactoring
+   - ``test:`` Adding or updating tests
+   - ``chore:`` Maintenance tasks
+
+6. **Push and create a merge request:**
+
+   .. code-block:: bash
+
+      git push origin feature/my-feature
+
+Project Structure
+-----------------
+
+.. code-block:: text
+
+   fastapi-traffic/
+   ├── fastapi_traffic/
+   │   ├── __init__.py          # Public API exports
+   │   ├── exceptions.py        # Custom exceptions
+   │   ├── middleware.py        # Rate limit middleware
+   │   ├── backends/
+   │   │   ├── base.py          # Backend abstract class
+   │   │   ├── memory.py        # In-memory backend
+   │   │   ├── sqlite.py        # SQLite backend
+   │   │   └── redis.py         # Redis backend
+   │   └── core/
+   │       ├── algorithms.py    # Rate limiting algorithms
+   │       ├── config.py        # Configuration classes
+   │       ├── config_loader.py # Configuration loading
+   │       ├── decorator.py     # @rate_limit decorator
+   │       ├── limiter.py       # Main RateLimiter class
+   │       └── models.py        # Data models
+   ├── tests/
+   │   ├── test_algorithms.py
+   │   ├── test_backends.py
+   │   ├── test_decorator.py
+   │   └── ...
+   ├── examples/
+   │   ├── 01_quickstart.py
+   │   └── ...
+   └── docs/
+       └── ...
+
+Guidelines
+----------
+
+**Code:**
+
+- Keep functions focused and small
+- Use descriptive variable names
+- Add docstrings to public functions and classes
+- Follow existing patterns in the codebase
+
+**Tests:**
+
+- Test both happy path and edge cases
+- Use descriptive test names
+- Mock external dependencies (Redis, etc.)
+- Keep tests fast and isolated
+
+**Documentation:**
+
+- Update docs when adding features
+- Include code examples
+- Keep language clear and concise
+
+Reporting Issues
+----------------
+
+Found a bug? Have a feature request? Please open an issue on GitLab:
+
+https://gitlab.com/zanewalker/fastapi-traffic/issues
+
+Include:
+
+- What you expected to happen
+- What actually happened
+- Steps to reproduce
+- Python version and OS
+- FastAPI Traffic version
+
+Questions?
+----------
+
+Feel free to open an issue for questions. We're happy to help!
--- a/docs/getting-started/installation.rst
+++ b/docs/getting-started/installation.rst
@@ -0,0 +1,105 @@
+Installation
+============
+
+FastAPI Traffic supports Python 3.10 and above. You can install it using pip, uv, or
+any other Python package manager.
+
+Basic Installation
+------------------
+
+The basic installation includes the memory backend, which is perfect for development
+and single-process applications:
+
+.. tab-set::
+
+   .. tab-item:: pip
+
+      .. code-block:: bash
+
+         pip install git+https://gitlab.com/zanewalker/fastapi-traffic.git
+
+   .. tab-item:: uv
+
+      .. code-block:: bash
+
+         uv add git+https://gitlab.com/zanewalker/fastapi-traffic.git
+
+   .. tab-item:: poetry
+
+      .. code-block:: bash
+
+         poetry add git+https://gitlab.com/zanewalker/fastapi-traffic.git
+
+With Redis Support
+------------------
+
+If you're running a distributed system with multiple application instances, you'll
+want the Redis backend:
+
+.. tab-set::
+
+   .. tab-item:: pip
+
+      .. code-block:: bash
+
+         pip install "git+https://gitlab.com/zanewalker/fastapi-traffic.git[redis]"
+
+   .. tab-item:: uv
+
+      .. code-block:: bash
+
+         uv add "git+https://gitlab.com/zanewalker/fastapi-traffic.git[redis]"
+
+Everything
+----------
+
+Want it all? Install with the ``all`` extra:
+
+.. code-block:: bash
+
+   pip install "git+https://gitlab.com/zanewalker/fastapi-traffic.git[all]"
+
+This includes Redis support and ensures FastAPI is installed as well.
+
+Dependencies
+------------
+
+FastAPI Traffic has minimal dependencies:
+
+- **pydantic** (>=2.0) — For configuration validation
+- **starlette** (>=0.27.0) — The ASGI framework that FastAPI is built on
+
+Optional dependencies:
+
+- **redis** (>=5.0.0) — Required for the Redis backend
+- **fastapi** (>=0.100.0) — While not strictly required (we work with Starlette directly),
+  you probably want this
+
+Verifying the Installation
+--------------------------
+
+After installation, you can verify everything is working:
+
+.. code-block:: python
+
+   import fastapi_traffic
+   print(fastapi_traffic.__version__)
+   # Should print: 0.2.1
+
+Or check which backends are available:
+
+.. code-block:: python
+
+   from fastapi_traffic import MemoryBackend, SQLiteBackend
+   print("Memory and SQLite backends available!")
+
+   try:
+       from fastapi_traffic import RedisBackend
+       print("Redis backend available!")
+   except ImportError:
+       print("Redis backend not installed (install with [redis] extra)")
+
+What's Next?
+------------
+
+Head over to the :doc:`quickstart` guide to start rate limiting your endpoints.
--- a/docs/getting-started/quickstart.rst
+++ b/docs/getting-started/quickstart.rst
@@ -0,0 +1,220 @@
+Quickstart
+==========
+
+Let's get rate limiting working in your FastAPI app. This guide covers the basics —
+you'll have something running in under five minutes.
+
+Your First Rate Limit
+---------------------
+
+The simplest way to add rate limiting is with the ``@rate_limit`` decorator:
+
+.. code-block:: python
+
+   from fastapi import FastAPI, Request
+   from fastapi_traffic import rate_limit
+
+   app = FastAPI()
+
+   @app.get("/api/hello")
+   @rate_limit(10, 60)  # 10 requests per 60 seconds
+   async def hello(request: Request):
+       return {"message": "Hello, World!"}
+
+That's the whole thing. Let's break down what's happening:
+
+1. The decorator takes two arguments: ``limit`` (max requests) and ``window_size`` (in seconds)
+2. Each client is identified by their IP address by default
+3. When a client exceeds the limit, they get a 429 response with a ``Retry-After`` header
+
+.. note::
+
+   The ``request: Request`` parameter is required. FastAPI Traffic needs access to the
+   request to identify the client and track their usage.
+
+Testing It Out
+--------------
+
+Fire up your app and hit the endpoint a few times:
+
+.. code-block:: bash
+
+   # Start your app
+   uvicorn main:app --reload
+
+   # In another terminal, make some requests
+   curl -i http://localhost:8000/api/hello
+
+You'll see headers like these in the response:
+
+.. code-block:: http
+
+   HTTP/1.1 200 OK
+   X-RateLimit-Limit: 10
+   X-RateLimit-Remaining: 9
+   X-RateLimit-Reset: 1709834400
+
+After 10 requests, you'll get:
+
+.. code-block:: http
+
+   HTTP/1.1 429 Too Many Requests
+   Retry-After: 45
+   X-RateLimit-Limit: 10
+   X-RateLimit-Remaining: 0
+
+Choosing an Algorithm
+---------------------
+
+Different situations call for different rate limiting strategies. Here's a quick guide:
+
+.. code-block:: python
+
+   from fastapi_traffic import rate_limit, Algorithm
+
+   # Token Bucket - great for APIs that need burst handling
+   # Allows short bursts of traffic, then smooths out
+   @app.get("/api/burst-friendly")
+   @rate_limit(100, 60, algorithm=Algorithm.TOKEN_BUCKET, burst_size=20)
+   async def burst_endpoint(request: Request):
+       return {"status": "ok"}
+
+   # Sliding Window - most accurate, but uses more memory
+   # Perfect when you need precise rate limiting
+   @app.get("/api/precise")
+   @rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW)
+   async def precise_endpoint(request: Request):
+       return {"status": "ok"}
+
+   # Fixed Window - simple and efficient
+   # Good for most use cases, slight edge case at window boundaries
+   @app.get("/api/simple")
+   @rate_limit(100, 60, algorithm=Algorithm.FIXED_WINDOW)
+   async def simple_endpoint(request: Request):
+       return {"status": "ok"}
+
+See :doc:`/user-guide/algorithms` for a deep dive into each algorithm.
+
+Rate Limiting by API Key
+------------------------
+
+IP-based limiting is fine for public endpoints, but for authenticated APIs you
+probably want to limit by API key:
+
+.. code-block:: python
+
+   def get_api_key(request: Request) -> str:
+       """Extract API key from header, fall back to IP."""
+       api_key = request.headers.get("X-API-Key")
+       if api_key:
+           return f"key:{api_key}"
+       # Fall back to IP for unauthenticated requests
+       return request.client.host if request.client else "unknown"
+
+   @app.get("/api/data")
+   @rate_limit(1000, 3600, key_extractor=get_api_key)  # 1000/hour per API key
+   async def get_data(request: Request):
+       return {"data": "sensitive stuff"}
+
+Global Rate Limiting with Middleware
+------------------------------------
+
+Sometimes you want a blanket rate limit across your entire API. That's what
+middleware is for:
+
+.. code-block:: python
+
+   from fastapi_traffic.middleware import RateLimitMiddleware
+
+   app = FastAPI()
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       exempt_paths={"/health", "/docs", "/openapi.json"},
+   )
+
+   # All endpoints now have a shared 1000 req/min limit
+   @app.get("/api/users")
+   async def get_users():
+       return {"users": []}
+
+   @app.get("/api/posts")
+   async def get_posts():
+       return {"posts": []}
+
+Using a Persistent Backend
+--------------------------
+
+The default memory backend works great for development, but it doesn't survive
+restarts and doesn't work across multiple processes. For production, use SQLite
+or Redis:
+
+**SQLite** — Good for single-node deployments:
+
+.. code-block:: python
+
+   from fastapi_traffic import RateLimiter, SQLiteBackend
+   from fastapi_traffic.core.limiter import set_limiter
+
+   # Set up persistent storage
+   backend = SQLiteBackend("rate_limits.db")
+   limiter = RateLimiter(backend)
+   set_limiter(limiter)
+
+   @app.on_event("startup")
+   async def startup():
+       await limiter.initialize()
+
+   @app.on_event("shutdown")
+   async def shutdown():
+       await limiter.close()
+
+**Redis** — Required for distributed systems:
+
+.. code-block:: python
+
+   from fastapi_traffic import RateLimiter
+   from fastapi_traffic.backends.redis import RedisBackend
+   from fastapi_traffic.core.limiter import set_limiter
+
+   @app.on_event("startup")
+   async def startup():
+       backend = await RedisBackend.from_url("redis://localhost:6379/0")
+       limiter = RateLimiter(backend)
+       set_limiter(limiter)
+       await limiter.initialize()
+
+Handling Rate Limit Errors
+--------------------------
+
+By default, exceeding the rate limit raises a ``RateLimitExceeded`` exception that
+returns a 429 response. You can customize this:
+
+.. code-block:: python
+
+   from fastapi import Request
+   from fastapi.responses import JSONResponse
+   from fastapi_traffic import RateLimitExceeded
+
+   @app.exception_handler(RateLimitExceeded)
+   async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
+       return JSONResponse(
+           status_code=429,
+           content={
+               "error": "slow_down",
+               "message": "You're making too many requests. Take a breather.",
+               "retry_after": exc.retry_after,
+           },
+       )
+
+What's Next?
+------------
+
+You've got the basics down. Here's where to go from here:
+
+- :doc:`/user-guide/algorithms` — Understand when to use each algorithm
+- :doc:`/user-guide/backends` — Learn about storage options
+- :doc:`/user-guide/key-extractors` — Advanced client identification
+- :doc:`/user-guide/configuration` — Load settings from files and environment variables
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -0,0 +1,148 @@
+FastAPI Traffic
+===============
+
+**Production-grade rate limiting for FastAPI that just works.**
+
+.. image:: https://img.shields.io/badge/python-3.10+-blue.svg
+   :target: https://www.python.org/downloads/
+
+.. image:: https://img.shields.io/badge/license-Apache%202.0-green.svg
+   :target: https://www.apache.org/licenses/LICENSE-2.0
+
+----
+
+FastAPI Traffic is a rate limiting library designed for real-world FastAPI applications.
+It gives you five battle-tested algorithms, three storage backends, and a clean API that
+stays out of your way.
+
+Whether you're building a public API that needs to handle thousands of requests per second
+or a small internal service that just needs basic protection, this library has you covered.
+
+Quick Example
+-------------
+
+Here's how simple it is to add rate limiting to your FastAPI app:
+
+.. code-block:: python
+
+   from fastapi import FastAPI, Request
+   from fastapi_traffic import rate_limit
+
+   app = FastAPI()
+
+   @app.get("/api/users")
+   @rate_limit(100, 60)  # 100 requests per minute
+   async def get_users(request: Request):
+       return {"users": ["alice", "bob"]}
+
+That's it. Your endpoint is now rate limited. Clients get helpful headers telling them
+how many requests they have left, and when they can try again if they hit the limit.
+
+Why FastAPI Traffic?
+--------------------
+
+Most rate limiting libraries fall into one of two camps: either they're too simple
+(fixed window only, no persistence) or they're way too complicated (requires reading
+a 50-page manual just to get started).
+
+We tried to hit the sweet spot:
+
+- **Five algorithms** — Pick the one that fits your use case. Token bucket for burst
+  handling, sliding window for precision, fixed window for simplicity.
+
+- **Three backends** — Memory for development, SQLite for single-node production,
+  Redis for distributed systems.
+
+- **Works how you'd expect** — Decorator for endpoints, middleware for global limits,
+  dependency injection if that's your style.
+
+- **Fully async** — Built from the ground up for async Python. No blocking calls,
+  no thread pool hacks.
+
+- **Type-checked** — Full type hints throughout. Works great with pyright and mypy.
+
+What's in the Box
+-----------------
+
+.. grid:: 2
+   :gutter: 3
+
+   .. grid-item-card:: 🚦 Rate Limiting
+      :link: getting-started/quickstart
+      :link-type: doc
+
+      Decorator-based rate limiting with sensible defaults.
+
+   .. grid-item-card:: 🔧 Algorithms
+      :link: user-guide/algorithms
+      :link-type: doc
+
+      Token bucket, sliding window, fixed window, leaky bucket, and more.
+
+   .. grid-item-card:: 💾 Backends
+      :link: user-guide/backends
+      :link-type: doc
+
+      Memory, SQLite, and Redis storage options.
+
+   .. grid-item-card:: ⚙️ Configuration
+      :link: user-guide/configuration
+      :link-type: doc
+
+      Load settings from environment variables or config files.
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Getting Started
+   :hidden:
+
+   getting-started/installation
+   getting-started/quickstart
+
+.. toctree::
+   :maxdepth: 2
+   :caption: User Guide
+   :hidden:
+
+   user-guide/algorithms
+   user-guide/backends
+   user-guide/middleware
+   user-guide/configuration
+   user-guide/key-extractors
+   user-guide/exception-handling
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Advanced Topics
+   :hidden:
+
+   advanced/distributed-systems
+   advanced/performance
+   advanced/testing
+
+.. toctree::
+   :maxdepth: 2
+   :caption: API Reference
+   :hidden:
+
+   api/decorator
+   api/middleware
+   api/algorithms
+   api/backends
+   api/config
+   api/exceptions
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Project
+   :hidden:
+
+   changelog
+   contributing
+
+Indices and tables
+------------------
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -0,0 +1,5 @@
+sphinx>=7.0.0
+furo>=2024.0.0
+sphinx-copybutton>=0.5.0
+myst-parser>=2.0.0
+sphinx-design>=0.5.0
--- a/docs/user-guide/algorithms.rst
+++ b/docs/user-guide/algorithms.rst
@@ -0,0 +1,290 @@
+Rate Limiting Algorithms
+========================
+
+FastAPI Traffic ships with five rate limiting algorithms. Each has its own strengths,
+and picking the right one depends on what you're trying to achieve.
+
+This guide will help you understand the tradeoffs and choose wisely.
+
+Overview
+--------
+
+Here's the quick comparison:
+
+.. list-table::
+   :header-rows: 1
+   :widths: 20 40 40
+
+   * - Algorithm
+     - Best For
+     - Tradeoffs
+   * - **Token Bucket**
+     - APIs that need burst handling
+     - Allows temporary spikes above average rate
+   * - **Sliding Window**
+     - Precise rate limiting
+     - Higher memory usage
+   * - **Fixed Window**
+     - Simple, low-overhead limiting
+     - Boundary issues (2x burst at window edges)
+   * - **Leaky Bucket**
+     - Consistent throughput
+     - No burst handling
+   * - **Sliding Window Counter**
+     - General purpose (default)
+     - Good balance of precision and efficiency
+
+Token Bucket
+------------
+
+Think of this as a bucket that holds tokens. Each request consumes a token, and
+tokens refill at a steady rate. If the bucket is empty, requests are rejected.
+
+.. code-block:: python
+
+   from fastapi_traffic import rate_limit, Algorithm
+
+   @app.get("/api/data")
+   @rate_limit(
+       100,  # 100 tokens refill per minute
+       60,
+       algorithm=Algorithm.TOKEN_BUCKET,
+       burst_size=20,  # bucket can hold up to 20 tokens
+   )
+   async def get_data(request: Request):
+       return {"data": "here"}
+
+**How it works:**
+
+1. The bucket starts full (at ``burst_size`` capacity)
+2. Each request removes one token
+3. Tokens refill at ``limit / window_size`` per second
+4. If no tokens are available, the request is rejected
+
+**When to use it:**
+
+- Your API has legitimate burst traffic (e.g., page loads that trigger multiple requests)
+- You want to allow short spikes while maintaining an average rate
+- Mobile apps that batch requests when coming online
+
+**Example scenario:** A mobile app that syncs data when it reconnects. You want to
+allow it to catch up quickly, but not overwhelm your servers.
+
+Sliding Window
+--------------
+
+This algorithm tracks the exact timestamp of every request within the window. It's
+the most accurate approach, but uses more memory.
+
+.. code-block:: python
+
+   @app.get("/api/transactions")
+   @rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW)
+   async def get_transactions(request: Request):
+       return {"transactions": []}
+
+**How it works:**
+
+1. Every request timestamp is stored
+2. When checking, we count requests in the last ``window_size`` seconds
+3. Old timestamps are cleaned up automatically
+
+**When to use it:**
+
+- You need precise rate limiting (financial APIs, compliance requirements)
+- Memory isn't a major concern
+- The rate limit is relatively low (not millions of requests)
+
+**Tradeoffs:**
+
+- Memory usage grows with request volume
+- Slightly more CPU for timestamp management
+
+Fixed Window
+------------
+
+The simplest algorithm. Divide time into fixed windows (e.g., every minute) and
+count requests in each window.
+
+.. code-block:: python
+
+   @app.get("/api/simple")
+   @rate_limit(100, 60, algorithm=Algorithm.FIXED_WINDOW)
+   async def simple_endpoint(request: Request):
+       return {"status": "ok"}
+
+**How it works:**
+
+1. Time is divided into fixed windows (0:00-1:00, 1:00-2:00, etc.)
+2. Each request increments the counter for the current window
+3. When the window changes, the counter resets
+
+**When to use it:**
+
+- You want the simplest, most efficient option
+- Slight inaccuracy at window boundaries is acceptable
+- High-volume scenarios where memory matters
+
+**The boundary problem:**
+
+A client could make 100 requests at 0:59 and another 100 at 1:01, effectively
+getting 200 requests in 2 seconds. If this matters for your use case, use
+sliding window counter instead.
+
+Leaky Bucket
+------------
+
+Imagine a bucket with a hole in the bottom. Requests fill the bucket, and it
+"leaks" at a constant rate. If the bucket overflows, requests are rejected.
+
+.. code-block:: python
+
+   @app.get("/api/steady")
+   @rate_limit(
+       100,
+       60,
+       algorithm=Algorithm.LEAKY_BUCKET,
+       burst_size=10,  # bucket capacity
+   )
+   async def steady_endpoint(request: Request):
+       return {"status": "ok"}
+
+**How it works:**
+
+1. The bucket has a maximum capacity (``burst_size``)
+2. Each request adds "water" to the bucket
+3. Water leaks out at ``limit / window_size`` per second
+4. If the bucket would overflow, the request is rejected
+
+**When to use it:**
+
+- You need consistent, smooth throughput
+- Downstream systems can't handle bursts
+- Processing capacity is truly fixed (e.g., hardware limitations)
+
+**Difference from token bucket:**
+
+- Token bucket allows bursts up to the bucket size
+- Leaky bucket smooths out traffic to a constant rate
+
+Sliding Window Counter
+----------------------
+
+This is the default algorithm, and it's a good choice for most use cases. It
+combines the efficiency of fixed windows with better accuracy.
+
+.. code-block:: python
+
+   @app.get("/api/default")
+   @rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW_COUNTER)
+   async def default_endpoint(request: Request):
+       return {"status": "ok"}
+
+**How it works:**
+
+1. Maintains counters for the current and previous windows
+2. Calculates a weighted average based on how far into the current window we are
+3. At 30 seconds into a 60-second window: ``count = prev_count * 0.5 + curr_count``
+
+**When to use it:**
+
+- General purpose rate limiting
+- You want better accuracy than fixed window without the memory cost of sliding window
+- Most APIs fall into this category
+
+**Why it's the default:**
+
+It gives you 90% of the accuracy of sliding window with the memory efficiency of
+fixed window. Unless you have specific requirements, this is probably what you want.
+
+Choosing the Right Algorithm
+----------------------------
+
+Here's a decision tree:
+
+1. **Do you need to allow bursts?**
+   
+   - Yes → Token Bucket
+   - No, I need smooth traffic → Leaky Bucket
+
+2. **Do you need exact precision?**
+   
+   - Yes, compliance/financial → Sliding Window
+   - No, good enough is fine → Continue
+
+3. **Is memory a concern?**
+   
+   - Yes, high volume → Fixed Window
+   - No → Sliding Window Counter (default)
+
+Performance Comparison
+----------------------
+
+All algorithms are O(1) for the check operation, but they differ in storage:
+
+.. list-table::
+   :header-rows: 1
+
+   * - Algorithm
+     - Storage per Key
+     - Operations
+   * - Token Bucket
+     - 2 floats
+     - 1 read, 1 write
+   * - Sliding Window
+     - N timestamps
+     - 1 read, 1 write, cleanup
+   * - Fixed Window
+     - 1 int, 1 float
+     - 1 read, 1 write
+   * - Leaky Bucket
+     - 2 floats
+     - 1 read, 1 write
+   * - Sliding Window Counter
+     - 3 values
+     - 1 read, 1 write
+
+For most applications, the performance difference is negligible. Choose based on
+behavior, not performance, unless you're handling millions of requests per second.
+
+Code Examples
+-------------
+
+Here's a complete example showing all algorithms:
+
+.. code-block:: python
+
+   from fastapi import FastAPI, Request
+   from fastapi_traffic import rate_limit, Algorithm
+
+   app = FastAPI()
+
+   # Burst-friendly endpoint
+   @app.get("/api/burst")
+   @rate_limit(100, 60, algorithm=Algorithm.TOKEN_BUCKET, burst_size=25)
+   async def burst_endpoint(request: Request):
+       return {"type": "token_bucket"}
+
+   # Precise limiting
+   @app.get("/api/precise")
+   @rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW)
+   async def precise_endpoint(request: Request):
+       return {"type": "sliding_window"}
+
+   # Simple and efficient
+   @app.get("/api/simple")
+   @rate_limit(100, 60, algorithm=Algorithm.FIXED_WINDOW)
+   async def simple_endpoint(request: Request):
+       return {"type": "fixed_window"}
+
+   # Smooth throughput
+   @app.get("/api/steady")
+   @rate_limit(100, 60, algorithm=Algorithm.LEAKY_BUCKET)
+   async def steady_endpoint(request: Request):
+       return {"type": "leaky_bucket"}
+
+   # Best of both worlds (default)
+   @app.get("/api/balanced")
+   @rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW_COUNTER)
+   async def balanced_endpoint(request: Request):
+       return {"type": "sliding_window_counter"}
--- a/docs/user-guide/backends.rst
+++ b/docs/user-guide/backends.rst
@@ -0,0 +1,312 @@
+Storage Backends
+================
+
+FastAPI Traffic needs somewhere to store rate limit state — how many requests each
+client has made, when their window resets, and so on. That's what backends are for.
+
+You have three options, each suited to different deployment scenarios.
+
+Choosing a Backend
+------------------
+
+Here's the quick guide:
+
+.. list-table::
+   :header-rows: 1
+   :widths: 20 30 50
+
+   * - Backend
+     - Use When
+     - Limitations
+   * - **Memory**
+     - Development, single-process apps
+     - Lost on restart, doesn't share across processes
+   * - **SQLite**
+     - Single-node production
+     - Doesn't share across machines
+   * - **Redis**
+     - Distributed systems, multiple nodes
+     - Requires Redis infrastructure
+
+Memory Backend
+--------------
+
+The default backend. It stores everything in memory using a dictionary with LRU
+eviction and automatic TTL cleanup.
+
+.. code-block:: python
+
+   from fastapi_traffic import MemoryBackend, RateLimiter
+   from fastapi_traffic.core.limiter import set_limiter
+
+   # This is what happens by default, but you can configure it:
+   backend = MemoryBackend(
+       max_size=10000,      # Maximum number of keys to store
+       cleanup_interval=60, # How often to clean expired entries (seconds)
+   )
+   limiter = RateLimiter(backend)
+   set_limiter(limiter)
+
+**When to use it:**
+
+- Local development
+- Single-process applications
+- Testing and CI/CD pipelines
+- When you don't need persistence
+
+**Limitations:**
+
+- State is lost when the process restarts
+- Doesn't work with multiple workers (each worker has its own memory)
+- Not suitable for ``gunicorn`` with multiple workers or Kubernetes pods
+
+**Memory management:**
+
+The backend automatically evicts old entries when it hits ``max_size``. It uses
+LRU (Least Recently Used) eviction, so inactive clients get cleaned up first.
+
+SQLite Backend
+--------------
+
+For single-node production deployments where you need persistence. Rate limits
+survive restarts and work across multiple processes on the same machine.
+
+.. code-block:: python
+
+   from fastapi_traffic import SQLiteBackend, RateLimiter
+   from fastapi_traffic.core.limiter import set_limiter
+
+   backend = SQLiteBackend(
+       "rate_limits.db",     # Database file path
+       cleanup_interval=300, # Clean expired entries every 5 minutes
+   )
+   limiter = RateLimiter(backend)
+   set_limiter(limiter)
+
+   @app.on_event("startup")
+   async def startup():
+       await limiter.initialize()
+
+   @app.on_event("shutdown")
+   async def shutdown():
+       await limiter.close()
+
+**When to use it:**
+
+- Single-server deployments
+- When you need rate limits to survive restarts
+- Multiple workers on the same machine (gunicorn, uvicorn with workers)
+- When Redis is overkill for your use case
+
+**Performance notes:**
+
+- Uses WAL (Write-Ahead Logging) mode for better concurrent performance
+- Connection pooling is handled automatically
+- Writes are batched where possible
+
+**File location:**
+
+Put the database file somewhere persistent. For Docker deployments, mount a volume:
+
+.. code-block:: yaml
+
+   # docker-compose.yml
+   services:
+     api:
+       volumes:
+         - ./data:/app/data
+       environment:
+         - RATE_LIMIT_DB=/app/data/rate_limits.db
+
+Redis Backend
+-------------
+
+The go-to choice for distributed systems. All your application instances share
+the same rate limit state.
+
+.. code-block:: python
+
+   from fastapi_traffic import RateLimiter
+   from fastapi_traffic.backends.redis import RedisBackend
+   from fastapi_traffic.core.limiter import set_limiter
+
+   @app.on_event("startup")
+   async def startup():
+       backend = await RedisBackend.from_url(
+           "redis://localhost:6379/0",
+           key_prefix="myapp:ratelimit",  # Optional prefix for all keys
+       )
+       limiter = RateLimiter(backend)
+       set_limiter(limiter)
+       await limiter.initialize()
+
+   @app.on_event("shutdown")
+   async def shutdown():
+       await limiter.close()
+
+**When to use it:**
+
+- Multiple application instances (Kubernetes, load-balanced servers)
+- When you need rate limits shared across your entire infrastructure
+- High-availability requirements
+
+**Connection options:**
+
+.. code-block:: python
+
+   # Simple connection
+   backend = await RedisBackend.from_url("redis://localhost:6379/0")
+
+   # With authentication
+   backend = await RedisBackend.from_url("redis://:password@localhost:6379/0")
+
+   # Redis Sentinel for HA
+   backend = await RedisBackend.from_url(
+       "redis://sentinel1:26379/0",
+       sentinel_master="mymaster",
+   )
+
+   # Redis Cluster
+   backend = await RedisBackend.from_url("redis://node1:6379,node2:6379,node3:6379/0")
+
+**Atomic operations:**
+
+The Redis backend uses Lua scripts to ensure atomic operations. This means rate
+limit checks are accurate even under high concurrency — no race conditions.
+
+**Key expiration:**
+
+Keys automatically expire based on the rate limit window. You don't need to worry
+about Redis filling up with stale data.
+
+Switching Backends
+------------------
+
+You can switch backends without changing your rate limiting code. Just configure
+a different backend at startup:
+
+.. code-block:: python
+
+   import os
+   from fastapi_traffic import RateLimiter, MemoryBackend, SQLiteBackend
+   from fastapi_traffic.core.limiter import set_limiter
+
+   def get_backend():
+       """Choose backend based on environment."""
+       env = os.getenv("ENVIRONMENT", "development")
+       
+       if env == "production":
+           redis_url = os.getenv("REDIS_URL")
+           if redis_url:
+               from fastapi_traffic.backends.redis import RedisBackend
+               return RedisBackend.from_url(redis_url)
+           return SQLiteBackend("/app/data/rate_limits.db")
+       
+       return MemoryBackend()
+
+   @app.on_event("startup")
+   async def startup():
+       backend = await get_backend()
+       limiter = RateLimiter(backend)
+       set_limiter(limiter)
+       await limiter.initialize()
+
+Custom Backends
+---------------
+
+Need something different? Maybe you want to use PostgreSQL, DynamoDB, or some
+other storage system. You can implement your own backend:
+
+.. code-block:: python
+
+   from fastapi_traffic.backends.base import Backend
+   from typing import Any
+
+   class MyCustomBackend(Backend):
+       async def get(self, key: str) -> dict[str, Any] | None:
+           """Retrieve state for a key."""
+           # Your implementation here
+           pass
+
+       async def set(self, key: str, value: dict[str, Any], *, ttl: float) -> None:
+           """Store state with TTL."""
+           pass
+
+       async def delete(self, key: str) -> None:
+           """Delete a key."""
+           pass
+
+       async def exists(self, key: str) -> bool:
+           """Check if key exists."""
+           pass
+
+       async def increment(self, key: str, amount: int = 1) -> int:
+           """Atomically increment a counter."""
+           pass
+
+       async def clear(self) -> None:
+           """Clear all data."""
+           pass
+
+       async def close(self) -> None:
+           """Clean up resources."""
+           pass
+
+The key methods are ``get``, ``set``, and ``delete``. The state is stored as a
+dictionary, and the backend is responsible for serialization.
+
+Backend Comparison
+------------------
+
+.. list-table::
+   :header-rows: 1
+
+   * - Feature
+     - Memory
+     - SQLite
+     - Redis
+   * - Persistence
+     - ❌
+     - ✅
+     - ✅
+   * - Multi-process
+     - ❌
+     - ✅
+     - ✅
+   * - Multi-node
+     - ❌
+     - ❌
+     - ✅
+   * - Setup complexity
+     - None
+     - Low
+     - Medium
+   * - Latency
+     - ~0.01ms
+     - ~0.1ms
+     - ~1ms
+   * - Dependencies
+     - None
+     - None
+     - redis package
+
+Best Practices
+--------------
+
+1. **Start with Memory, upgrade when needed.** Don't over-engineer. Memory is
+   fine for development and many production scenarios.
+
+2. **Use Redis for distributed systems.** If you have multiple application
+   instances, Redis is the only option that works correctly.
+
+3. **Handle backend errors gracefully.** Set ``skip_on_error=True`` if you'd
+   rather allow requests through than fail when the backend is down:
+
+   .. code-block:: python
+
+      @rate_limit(100, 60, skip_on_error=True)
+      async def endpoint(request: Request):
+          return {"status": "ok"}
+
+4. **Monitor your backend.** Keep an eye on memory usage (Memory backend),
+   disk space (SQLite), or Redis memory and connections.
--- a/docs/user-guide/configuration.rst
+++ b/docs/user-guide/configuration.rst
@@ -0,0 +1,315 @@
+Configuration
+=============
+
+FastAPI Traffic supports loading configuration from environment variables and files.
+This makes it easy to manage settings across different environments without changing code.
+
+Configuration Loader
+--------------------
+
+The ``ConfigLoader`` class handles loading configuration from various sources:
+
+.. code-block:: python
+
+   from fastapi_traffic import ConfigLoader, RateLimitConfig
+
+   loader = ConfigLoader()
+
+   # Load from environment variables
+   config = loader.load_rate_limit_config_from_env()
+
+   # Load from a JSON file
+   config = loader.load_rate_limit_config_from_json("config/rate_limits.json")
+
+   # Load from a .env file
+   config = loader.load_rate_limit_config_from_env_file(".env")
+
+Environment Variables
+---------------------
+
+Set rate limit configuration using environment variables with the ``FASTAPI_TRAFFIC_``
+prefix:
+
+.. code-block:: bash
+
+   # Basic settings
+   export FASTAPI_TRAFFIC_RATE_LIMIT_LIMIT=100
+   export FASTAPI_TRAFFIC_RATE_LIMIT_WINDOW_SIZE=60
+   export FASTAPI_TRAFFIC_RATE_LIMIT_ALGORITHM=sliding_window_counter
+
+   # Optional settings
+   export FASTAPI_TRAFFIC_RATE_LIMIT_KEY_PREFIX=myapp
+   export FASTAPI_TRAFFIC_RATE_LIMIT_BURST_SIZE=20
+   export FASTAPI_TRAFFIC_RATE_LIMIT_INCLUDE_HEADERS=true
+   export FASTAPI_TRAFFIC_RATE_LIMIT_ERROR_MESSAGE="Too many requests"
+   export FASTAPI_TRAFFIC_RATE_LIMIT_STATUS_CODE=429
+   export FASTAPI_TRAFFIC_RATE_LIMIT_SKIP_ON_ERROR=false
+   export FASTAPI_TRAFFIC_RATE_LIMIT_COST=1
+
+Then load them in your app:
+
+.. code-block:: python
+
+   from fastapi_traffic import load_rate_limit_config_from_env, rate_limit
+
+   # Load config from environment
+   config = load_rate_limit_config_from_env()
+
+   # Use it with the decorator
+   @app.get("/api/data")
+   @rate_limit(config.limit, config.window_size, algorithm=config.algorithm)
+   async def get_data(request: Request):
+       return {"data": "here"}
+
+Custom Prefix
+-------------
+
+If ``FASTAPI_TRAFFIC_`` conflicts with something else, use a custom prefix:
+
+.. code-block:: python
+
+   loader = ConfigLoader(prefix="MYAPP_RATELIMIT")
+   config = loader.load_rate_limit_config_from_env()
+
+   # Now reads from:
+   # MYAPP_RATELIMIT_RATE_LIMIT_LIMIT=100
+   # MYAPP_RATELIMIT_RATE_LIMIT_WINDOW_SIZE=60
+   # etc.
+
+JSON Configuration
+------------------
+
+For more complex setups, use a JSON file:
+
+.. code-block:: json
+
+   {
+     "limit": 100,
+     "window_size": 60,
+     "algorithm": "token_bucket",
+     "burst_size": 25,
+     "key_prefix": "api",
+     "include_headers": true,
+     "error_message": "Rate limit exceeded. Please slow down.",
+     "status_code": 429,
+     "skip_on_error": false,
+     "cost": 1
+   }
+
+Load it:
+
+.. code-block:: python
+
+   from fastapi_traffic import ConfigLoader
+
+   loader = ConfigLoader()
+   config = loader.load_rate_limit_config_from_json("config/rate_limits.json")
+
+.env Files
+----------
+
+You can also use ``.env`` files, which is handy for local development:
+
+.. code-block:: bash
+
+   # .env
+   FASTAPI_TRAFFIC_RATE_LIMIT_LIMIT=100
+   FASTAPI_TRAFFIC_RATE_LIMIT_WINDOW_SIZE=60
+   FASTAPI_TRAFFIC_RATE_LIMIT_ALGORITHM=sliding_window
+
+Load it:
+
+.. code-block:: python
+
+   loader = ConfigLoader()
+   config = loader.load_rate_limit_config_from_env_file(".env")
+
+Global Configuration
+--------------------
+
+Besides per-endpoint configuration, you can set global defaults:
+
+.. code-block:: bash
+
+   # Global settings
+   export FASTAPI_TRAFFIC_GLOBAL_ENABLED=true
+   export FASTAPI_TRAFFIC_GLOBAL_DEFAULT_LIMIT=100
+   export FASTAPI_TRAFFIC_GLOBAL_DEFAULT_WINDOW_SIZE=60
+   export FASTAPI_TRAFFIC_GLOBAL_DEFAULT_ALGORITHM=sliding_window_counter
+   export FASTAPI_TRAFFIC_GLOBAL_KEY_PREFIX=fastapi_traffic
+   export FASTAPI_TRAFFIC_GLOBAL_INCLUDE_HEADERS=true
+   export FASTAPI_TRAFFIC_GLOBAL_ERROR_MESSAGE="Rate limit exceeded"
+   export FASTAPI_TRAFFIC_GLOBAL_STATUS_CODE=429
+   export FASTAPI_TRAFFIC_GLOBAL_SKIP_ON_ERROR=false
+   export FASTAPI_TRAFFIC_GLOBAL_HEADERS_PREFIX=X-RateLimit
+
+Load global config:
+
+.. code-block:: python
+
+   from fastapi_traffic import load_global_config_from_env, RateLimiter
+   from fastapi_traffic.core.limiter import set_limiter
+
+   global_config = load_global_config_from_env()
+   limiter = RateLimiter(config=global_config)
+   set_limiter(limiter)
+
+Auto-Detection
+--------------
+
+The convenience functions automatically detect file format:
+
+.. code-block:: python
+
+   from fastapi_traffic import load_rate_limit_config, load_global_config
+
+   # Detects JSON by extension
+   config = load_rate_limit_config("config/limits.json")
+
+   # Detects .env file
+   config = load_rate_limit_config("config/.env")
+
+   # Works for global config too
+   global_config = load_global_config("config/global.json")
+
+Overriding Values
+-----------------
+
+You can override loaded values programmatically:
+
+.. code-block:: python
+
+   loader = ConfigLoader()
+   
+   # Load base config from file
+   config = loader.load_rate_limit_config_from_json(
+       "config/base.json",
+       limit=200,  # Override the limit
+       key_prefix="custom",  # Override the prefix
+   )
+
+This is useful for environment-specific overrides:
+
+.. code-block:: python
+
+   import os
+
+   base_config = loader.load_rate_limit_config_from_json("config/base.json")
+
+   # Apply environment-specific overrides
+   if os.getenv("ENVIRONMENT") == "production":
+       config = loader.load_rate_limit_config_from_json(
+           "config/base.json",
+           limit=base_config.limit * 2,  # Double the limit in production
+       )
+
+Validation
+----------
+
+Configuration is validated when loaded. Invalid values raise ``ConfigurationError``:
+
+.. code-block:: python
+
+   from fastapi_traffic import ConfigLoader, ConfigurationError
+
+   loader = ConfigLoader()
+
+   try:
+       config = loader.load_rate_limit_config_from_env()
+   except ConfigurationError as e:
+       print(f"Invalid configuration: {e}")
+       # Handle the error appropriately
+
+Common validation errors:
+
+- ``limit`` must be a positive integer
+- ``window_size`` must be a positive number
+- ``algorithm`` must be one of the valid algorithm names
+- ``status_code`` must be a valid HTTP status code
+
+Algorithm Names
+---------------
+
+When specifying algorithms in configuration, use these names:
+
+.. list-table::
+   :header-rows: 1
+
+   * - Config Value
+     - Algorithm
+   * - ``token_bucket``
+     - Token Bucket
+   * - ``sliding_window``
+     - Sliding Window
+   * - ``fixed_window``
+     - Fixed Window
+   * - ``leaky_bucket``
+     - Leaky Bucket
+   * - ``sliding_window_counter``
+     - Sliding Window Counter (default)
+
+Boolean Values
+--------------
+
+Boolean settings accept various formats:
+
+- **True:** ``true``, ``1``, ``yes``, ``on``
+- **False:** ``false``, ``0``, ``no``, ``off``
+
+Case doesn't matter.
+
+Complete Example
+----------------
+
+Here's a full example showing configuration loading in a real app:
+
+.. code-block:: python
+
+   import os
+   from fastapi import FastAPI, Request
+   from fastapi_traffic import (
+       ConfigLoader,
+       ConfigurationError,
+       RateLimiter,
+       rate_limit,
+   )
+   from fastapi_traffic.core.limiter import set_limiter
+
+   app = FastAPI()
+
+   @app.on_event("startup")
+   async def startup():
+       loader = ConfigLoader()
+       
+       try:
+           # Try to load from environment first
+           global_config = loader.load_global_config_from_env()
+       except ConfigurationError:
+           # Fall back to defaults
+           global_config = None
+       
+       limiter = RateLimiter(config=global_config)
+       set_limiter(limiter)
+       await limiter.initialize()
+
+   @app.get("/api/data")
+   @rate_limit(100, 60)
+   async def get_data(request: Request):
+       return {"data": "here"}
+
+   # Or load endpoint-specific config
+   loader = ConfigLoader()
+   try:
+       api_config = loader.load_rate_limit_config_from_json("config/api_limits.json")
+   except (FileNotFoundError, ConfigurationError):
+       api_config = None
+
+   if api_config:
+       @app.get("/api/special")
+       @rate_limit(
+           api_config.limit,
+           api_config.window_size,
+           algorithm=api_config.algorithm,
+       )
+       async def special_endpoint(request: Request):
+           return {"special": "data"}
--- a/docs/user-guide/exception-handling.rst
+++ b/docs/user-guide/exception-handling.rst
@@ -0,0 +1,277 @@
+Exception Handling
+==================
+
+When a client exceeds their rate limit, FastAPI Traffic raises a ``RateLimitExceeded``
+exception. This guide covers how to handle it gracefully.
+
+Default Behavior
+----------------
+
+By default, when a rate limit is exceeded, the library raises ``RateLimitExceeded``.
+FastAPI will convert this to a 500 error unless you handle it.
+
+The exception contains useful information:
+
+.. code-block:: python
+
+   from fastapi_traffic import RateLimitExceeded
+
+   try:
+       # Rate limited operation
+       pass
+   except RateLimitExceeded as exc:
+       print(exc.message)       # "Rate limit exceeded"
+       print(exc.retry_after)   # Seconds until they can retry (e.g., 45.2)
+       print(exc.limit_info)    # RateLimitInfo object with full details
+
+Custom Exception Handler
+------------------------
+
+The most common approach is to register a custom exception handler:
+
+.. code-block:: python
+
+   from fastapi import FastAPI, Request
+   from fastapi.responses import JSONResponse
+   from fastapi_traffic import RateLimitExceeded
+
+   app = FastAPI()
+
+   @app.exception_handler(RateLimitExceeded)
+   async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
+       return JSONResponse(
+           status_code=429,
+           content={
+               "error": "rate_limit_exceeded",
+               "message": "You're making too many requests. Please slow down.",
+               "retry_after": exc.retry_after,
+           },
+           headers={
+               "Retry-After": str(int(exc.retry_after or 60)),
+           },
+       )
+
+Now clients get a clean JSON response instead of a generic error.
+
+Including Rate Limit Headers
+----------------------------
+
+The ``limit_info`` object can generate standard rate limit headers:
+
+.. code-block:: python
+
+   @app.exception_handler(RateLimitExceeded)
+   async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
+       headers = {}
+       if exc.limit_info:
+           headers = exc.limit_info.to_headers()
+       
+       return JSONResponse(
+           status_code=429,
+           content={
+               "error": "rate_limit_exceeded",
+               "retry_after": exc.retry_after,
+           },
+           headers=headers,
+       )
+
+This adds headers like:
+
+.. code-block:: text
+
+   X-RateLimit-Limit: 100
+   X-RateLimit-Remaining: 0
+   X-RateLimit-Reset: 1709834400
+   Retry-After: 45
+
+Different Responses for Different Endpoints
+-------------------------------------------
+
+You might want different error messages for different parts of your API:
+
+.. code-block:: python
+
+   @app.exception_handler(RateLimitExceeded)
+   async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
+       path = request.url.path
+       
+       if path.startswith("/api/v1/"):
+           # API clients get JSON
+           return JSONResponse(
+               status_code=429,
+               content={"error": "rate_limit_exceeded", "retry_after": exc.retry_after},
+           )
+       elif path.startswith("/web/"):
+           # Web users get a friendly HTML page
+           return HTMLResponse(
+               status_code=429,
+               content="<h1>Slow down!</h1><p>Please wait a moment before trying again.</p>",
+           )
+       else:
+           # Default response
+           return JSONResponse(
+               status_code=429,
+               content={"detail": exc.message},
+           )
+
+Using the on_blocked Callback
+-----------------------------
+
+Instead of (or in addition to) exception handling, you can use the ``on_blocked``
+callback to run code when a request is blocked:
+
+.. code-block:: python
+
+   import logging
+
+   logger = logging.getLogger(__name__)
+
+   def log_blocked_request(request: Request, result):
+       """Log when a request is rate limited."""
+       client_ip = request.client.host if request.client else "unknown"
+       logger.warning(
+           "Rate limit exceeded for %s on %s %s",
+           client_ip,
+           request.method,
+           request.url.path,
+       )
+
+   @app.get("/api/data")
+   @rate_limit(100, 60, on_blocked=log_blocked_request)
+   async def get_data(request: Request):
+       return {"data": "here"}
+
+The callback receives the request and the rate limit result. It runs before the
+exception is raised.
+
+Exempting Certain Requests
+--------------------------
+
+Use ``exempt_when`` to skip rate limiting for certain requests:
+
+.. code-block:: python
+
+   def is_admin(request: Request) -> bool:
+       """Check if request is from an admin."""
+       user = getattr(request.state, "user", None)
+       return user is not None and user.is_admin
+
+   @app.get("/api/data")
+   @rate_limit(100, 60, exempt_when=is_admin)
+   async def get_data(request: Request):
+       return {"data": "here"}
+
+Admin requests bypass rate limiting entirely.
+
+Graceful Degradation
+--------------------
+
+Sometimes you'd rather serve a degraded response than reject the request entirely:
+
+.. code-block:: python
+
+   from fastapi_traffic import RateLimiter, RateLimitConfig
+   from fastapi_traffic.core.limiter import get_limiter
+
+   @app.get("/api/search")
+   async def search(request: Request, q: str):
+       limiter = get_limiter()
+       config = RateLimitConfig(limit=100, window_size=60)
+       
+       result = await limiter.check(request, config)
+       
+       if not result.allowed:
+           # Return cached/simplified results instead of blocking
+           return {
+               "results": get_cached_results(q),
+               "note": "Results may be stale. Please try again later.",
+               "retry_after": result.info.retry_after,
+           }
+       
+       # Full search
+       return {"results": perform_full_search(q)}
+
+Backend Errors
+--------------
+
+If the rate limit backend fails (Redis down, SQLite locked, etc.), you have options:
+
+**Option 1: Fail closed (default)**
+
+Requests fail when the backend is unavailable. Safer, but impacts availability.
+
+**Option 2: Fail open**
+
+Allow requests through when the backend fails:
+
+.. code-block:: python
+
+   @app.get("/api/data")
+   @rate_limit(100, 60, skip_on_error=True)
+   async def get_data(request: Request):
+       return {"data": "here"}
+
+**Option 3: Handle the error explicitly**
+
+.. code-block:: python
+
+   from fastapi_traffic import BackendError
+
+   @app.exception_handler(BackendError)
+   async def backend_error_handler(request: Request, exc: BackendError):
+       # Log the error
+       logger.error("Rate limit backend error: %s", exc.original_error)
+       
+       # Decide what to do
+       # Option A: Allow the request
+       return None  # Let the request continue
+       
+       # Option B: Return an error
+       return JSONResponse(
+           status_code=503,
+           content={"error": "service_unavailable"},
+       )
+
+Other Exceptions
+----------------
+
+FastAPI Traffic defines a few exception types:
+
+.. code-block:: python
+
+   from fastapi_traffic import (
+       RateLimitExceeded,   # Rate limit was exceeded
+       BackendError,        # Storage backend failed
+       ConfigurationError,  # Invalid configuration
+   )
+
+All inherit from ``FastAPITrafficError``:
+
+.. code-block:: python
+
+   from fastapi_traffic.exceptions import FastAPITrafficError
+
+   @app.exception_handler(FastAPITrafficError)
+   async def traffic_error_handler(request: Request, exc: FastAPITrafficError):
+       """Catch-all for FastAPI Traffic errors."""
+       if isinstance(exc, RateLimitExceeded):
+           return JSONResponse(status_code=429, content={"error": "rate_limited"})
+       elif isinstance(exc, BackendError):
+           return JSONResponse(status_code=503, content={"error": "backend_error"})
+       else:
+           return JSONResponse(status_code=500, content={"error": "internal_error"})
+
+Helper Function
+---------------
+
+FastAPI Traffic provides a helper to create rate limit responses:
+
+.. code-block:: python
+
+   from fastapi_traffic.core.decorator import create_rate_limit_response
+
+   @app.exception_handler(RateLimitExceeded)
+   async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
+       return create_rate_limit_response(exc, include_headers=True)
+
+This creates a standard 429 response with all the appropriate headers.
--- a/docs/user-guide/key-extractors.rst
+++ b/docs/user-guide/key-extractors.rst
@@ -0,0 +1,258 @@
+Key Extractors
+==============
+
+A key extractor is a function that identifies who's making a request. By default,
+FastAPI Traffic uses the client's IP address, but you can customize this to fit
+your authentication model.
+
+How It Works
+------------
+
+Every rate limit needs a way to group requests. The key extractor returns a string
+that identifies the client:
+
+.. code-block:: python
+
+   def my_key_extractor(request: Request) -> str:
+       return "some-unique-identifier"
+
+All requests that return the same identifier share the same rate limit bucket.
+
+Default Behavior
+----------------
+
+The default extractor looks for the client IP in this order:
+
+1. ``X-Forwarded-For`` header (first IP in the list)
+2. ``X-Real-IP`` header
+3. Direct connection IP (``request.client.host``)
+4. Falls back to ``"unknown"``
+
+This handles most reverse proxy setups automatically.
+
+Rate Limiting by API Key
+------------------------
+
+For authenticated APIs, you probably want to limit by API key:
+
+.. code-block:: python
+
+   from fastapi import Request
+   from fastapi_traffic import rate_limit
+
+   def api_key_extractor(request: Request) -> str:
+       """Rate limit by API key."""
+       api_key = request.headers.get("X-API-Key")
+       if api_key:
+           return f"apikey:{api_key}"
+       # Fall back to IP for unauthenticated requests
+       return f"ip:{request.client.host}" if request.client else "ip:unknown"
+
+   @app.get("/api/data")
+   @rate_limit(1000, 3600, key_extractor=api_key_extractor)
+   async def get_data(request: Request):
+       return {"data": "here"}
+
+Now each API key gets its own rate limit bucket.
+
+Rate Limiting by User
+---------------------
+
+If you're using authentication middleware that sets the user:
+
+.. code-block:: python
+
+   def user_extractor(request: Request) -> str:
+       """Rate limit by authenticated user."""
+       # Assuming your auth middleware sets request.state.user
+       user = getattr(request.state, "user", None)
+       if user:
+           return f"user:{user.id}"
+       return f"ip:{request.client.host}" if request.client else "ip:unknown"
+
+   @app.get("/api/profile")
+   @rate_limit(100, 60, key_extractor=user_extractor)
+   async def get_profile(request: Request):
+       return {"profile": "data"}
+
+Rate Limiting by Tenant
+-----------------------
+
+For multi-tenant applications:
+
+.. code-block:: python
+
+   def tenant_extractor(request: Request) -> str:
+       """Rate limit by tenant."""
+       # From subdomain
+       host = request.headers.get("host", "")
+       if "." in host:
+           tenant = host.split(".")[0]
+           return f"tenant:{tenant}"
+       
+       # Or from header
+       tenant = request.headers.get("X-Tenant-ID")
+       if tenant:
+           return f"tenant:{tenant}"
+       
+       return "tenant:default"
+
+Combining Identifiers
+---------------------
+
+Sometimes you want to combine multiple factors:
+
+.. code-block:: python
+
+   def combined_extractor(request: Request) -> str:
+       """Rate limit by user AND endpoint."""
+       user = getattr(request.state, "user", None)
+       user_id = user.id if user else "anonymous"
+       endpoint = request.url.path
+       return f"{user_id}:{endpoint}"
+
+This gives each user a separate limit for each endpoint.
+
+Tiered Rate Limits
+------------------
+
+Different users might have different limits. Handle this with a custom extractor
+that includes the tier:
+
+.. code-block:: python
+
+   def tiered_extractor(request: Request) -> str:
+       """Include tier in the key for different limits."""
+       user = getattr(request.state, "user", None)
+       if user:
+           # Premium users get a different bucket
+           tier = "premium" if user.is_premium else "free"
+           return f"{tier}:{user.id}"
+       return f"anonymous:{request.client.host}"
+
+Then apply different limits based on tier:
+
+.. code-block:: python
+
+   # You'd typically do this with middleware or dependency injection
+   # to check the tier and apply the appropriate limit
+
+   @app.get("/api/data")
+   async def get_data(request: Request):
+       user = getattr(request.state, "user", None)
+       if user and user.is_premium:
+           # Premium: 10000 req/hour
+           limit, window = 10000, 3600
+       else:
+           # Free: 100 req/hour
+           limit, window = 100, 3600
+       
+       # Apply rate limit manually
+       limiter = get_limiter()
+       config = RateLimitConfig(limit=limit, window_size=window)
+       await limiter.hit(request, config)
+       
+       return {"data": "here"}
+
+Geographic Rate Limiting
+------------------------
+
+Limit by country or region:
+
+.. code-block:: python
+
+   def geo_extractor(request: Request) -> str:
+       """Rate limit by country."""
+       # Assuming you have a GeoIP lookup
+       country = request.headers.get("CF-IPCountry", "XX")  # Cloudflare header
+       ip = request.client.host if request.client else "unknown"
+       return f"{country}:{ip}"
+
+This lets you apply different limits to different regions if needed.
+
+Endpoint-Specific Keys
+----------------------
+
+Rate limit the same user differently per endpoint:
+
+.. code-block:: python
+
+   def endpoint_user_extractor(request: Request) -> str:
+       """Separate limits per endpoint per user."""
+       user = getattr(request.state, "user", None)
+       user_id = user.id if user else request.client.host
+       method = request.method
+       path = request.url.path
+       return f"{user_id}:{method}:{path}"
+
+Best Practices
+--------------
+
+1. **Always have a fallback.** If your primary identifier isn't available, fall
+   back to IP:
+
+   .. code-block:: python
+
+      def safe_extractor(request: Request) -> str:
+          api_key = request.headers.get("X-API-Key")
+          if api_key:
+              return f"key:{api_key}"
+          return f"ip:{request.client.host if request.client else 'unknown'}"
+
+2. **Use prefixes.** When mixing identifier types, prefix them to avoid collisions:
+
+   .. code-block:: python
+
+      # Good - clear what each key represents
+      return f"user:{user_id}"
+      return f"ip:{ip_address}"
+      return f"key:{api_key}"
+
+      # Bad - could collide
+      return user_id
+      return ip_address
+
+3. **Keep it fast.** The extractor runs on every request. Avoid database lookups
+   or expensive operations:
+
+   .. code-block:: python
+
+      # Bad - database lookup on every request
+      def slow_extractor(request: Request) -> str:
+          user = db.get_user(request.headers.get("Authorization"))
+          return user.id
+
+      # Good - use data already in the request
+      def fast_extractor(request: Request) -> str:
+          return request.state.user.id  # Set by auth middleware
+
+4. **Be consistent.** The same client should always get the same key. Watch out
+   for things like:
+
+   - IP addresses changing (mobile users)
+   - Case sensitivity (normalize to lowercase)
+   - Whitespace (strip it)
+
+   .. code-block:: python
+
+      def normalized_extractor(request: Request) -> str:
+          api_key = request.headers.get("X-API-Key", "").strip().lower()
+          if api_key:
+              return f"key:{api_key}"
+          return f"ip:{request.client.host}"
+
+Using with Middleware
+---------------------
+
+Key extractors work the same way with middleware:
+
+.. code-block:: python
+
+   from fastapi_traffic.middleware import RateLimitMiddleware
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       key_extractor=api_key_extractor,
+   )
--- a/docs/user-guide/middleware.rst
+++ b/docs/user-guide/middleware.rst
@@ -0,0 +1,322 @@
+Middleware
+==========
+
+Sometimes you want rate limiting applied to your entire API, not just individual
+endpoints. That's where middleware comes in.
+
+Middleware sits between the client and your application, checking every request
+before it reaches your endpoints.
+
+Basic Usage
+-----------
+
+Add the middleware to your FastAPI app:
+
+.. code-block:: python
+
+   from fastapi import FastAPI
+   from fastapi_traffic.middleware import RateLimitMiddleware
+
+   app = FastAPI()
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,       # 1000 requests
+       window_size=60,   # per minute
+   )
+
+   @app.get("/api/users")
+   async def get_users():
+       return {"users": []}
+
+   @app.get("/api/posts")
+   async def get_posts():
+       return {"posts": []}
+
+Now every endpoint shares the same rate limit pool. A client who makes 500 requests
+to ``/api/users`` only has 500 left for ``/api/posts``.
+
+Exempting Paths
+---------------
+
+You probably don't want to rate limit your health checks or documentation:
+
+.. code-block:: python
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       exempt_paths={
+           "/health",
+           "/ready",
+           "/docs",
+           "/redoc",
+           "/openapi.json",
+       },
+   )
+
+These paths bypass rate limiting entirely.
+
+Exempting IPs
+-------------
+
+Internal services, monitoring systems, or your own infrastructure might need
+unrestricted access:
+
+.. code-block:: python
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       exempt_ips={
+           "127.0.0.1",
+           "10.0.0.0/8",      # Internal network
+           "192.168.1.100",   # Monitoring server
+       },
+   )
+
+.. note::
+
+   IP exemptions are checked against the client IP extracted by the key extractor.
+   Make sure your proxy headers are configured correctly if you're behind a load
+   balancer.
+
+Custom Key Extraction
+---------------------
+
+By default, clients are identified by IP address. You can change this:
+
+.. code-block:: python
+
+   from starlette.requests import Request
+
+   def get_client_id(request: Request) -> str:
+       """Identify clients by API key, fall back to IP."""
+       api_key = request.headers.get("X-API-Key")
+       if api_key:
+           return f"api:{api_key}"
+       return request.client.host if request.client else "unknown"
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       key_extractor=get_client_id,
+   )
+
+Choosing an Algorithm
+---------------------
+
+The middleware supports all five algorithms:
+
+.. code-block:: python
+
+   from fastapi_traffic.core.algorithms import Algorithm
+
+   # Token bucket for burst-friendly limiting
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       algorithm=Algorithm.TOKEN_BUCKET,
+   )
+
+   # Sliding window for precise limiting
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       algorithm=Algorithm.SLIDING_WINDOW,
+   )
+
+Using a Custom Backend
+----------------------
+
+By default, middleware uses the memory backend. For production, you'll want
+something persistent:
+
+.. code-block:: python
+
+   from fastapi_traffic import SQLiteBackend
+   from fastapi_traffic.middleware import RateLimitMiddleware
+
+   backend = SQLiteBackend("rate_limits.db")
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       backend=backend,
+   )
+
+   @app.on_event("shutdown")
+   async def shutdown():
+       await backend.close()
+
+For Redis:
+
+.. code-block:: python
+
+   from fastapi_traffic.backends.redis import RedisBackend
+
+   # Create backend at startup
+   redis_backend = None
+
+   @app.on_event("startup")
+   async def startup():
+       global redis_backend
+       redis_backend = await RedisBackend.from_url("redis://localhost:6379/0")
+
+   # Note: You'll need to configure middleware after startup
+   # or use a factory pattern
+
+Convenience Middleware Classes
+------------------------------
+
+For common use cases, we provide pre-configured middleware:
+
+.. code-block:: python
+
+   from fastapi_traffic.middleware import (
+       SlidingWindowMiddleware,
+       TokenBucketMiddleware,
+   )
+
+   # Sliding window algorithm
+   app.add_middleware(
+       SlidingWindowMiddleware,
+       limit=1000,
+       window_size=60,
+   )
+
+   # Token bucket algorithm
+   app.add_middleware(
+       TokenBucketMiddleware,
+       limit=1000,
+       window_size=60,
+   )
+
+Combining with Decorator
+------------------------
+
+You can use both middleware and decorators. The middleware provides a baseline
+limit, and decorators can add stricter limits to specific endpoints:
+
+.. code-block:: python
+
+   from fastapi_traffic import rate_limit
+   from fastapi_traffic.middleware import RateLimitMiddleware
+
+   # Global limit: 1000 req/min
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+   )
+
+   # This endpoint has an additional, stricter limit
+   @app.post("/api/expensive-operation")
+   @rate_limit(10, 60)  # Only 10 req/min for this endpoint
+   async def expensive_operation(request: Request):
+       return {"result": "done"}
+
+   # This endpoint uses only the global limit
+   @app.get("/api/cheap-operation")
+   async def cheap_operation():
+       return {"result": "done"}
+
+Both limits are checked. A request must pass both the middleware limit AND the
+decorator limit.
+
+Error Responses
+---------------
+
+When a client exceeds the rate limit, they get a 429 response:
+
+.. code-block:: json
+
+   {
+     "detail": "Rate limit exceeded. Please try again later.",
+     "retry_after": 45.2
+   }
+
+You can customize the message:
+
+.. code-block:: python
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       error_message="Whoa there! You're making requests too fast.",
+       status_code=429,
+   )
+
+Response Headers
+----------------
+
+By default, rate limit headers are included in every response:
+
+.. code-block:: http
+
+   X-RateLimit-Limit: 1000
+   X-RateLimit-Remaining: 847
+   X-RateLimit-Reset: 1709834400
+
+When rate limited:
+
+.. code-block:: http
+
+   Retry-After: 45
+
+Disable headers if you don't want to expose this information:
+
+.. code-block:: python
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       include_headers=False,
+   )
+
+Handling Backend Errors
+-----------------------
+
+What happens if your Redis server goes down? By default, the middleware will
+raise an exception. You can change this behavior:
+
+.. code-block:: python
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       skip_on_error=True,  # Allow requests through if backend fails
+   )
+
+With ``skip_on_error=True``, requests are allowed through when the backend is
+unavailable. This is a tradeoff between availability and protection.
+
+Full Configuration Reference
+----------------------------
+
+.. code-block:: python
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,                    # Max requests per window
+       window_size=60.0,              # Window size in seconds
+       algorithm=Algorithm.SLIDING_WINDOW_COUNTER,  # Algorithm to use
+       backend=None,                  # Storage backend (default: MemoryBackend)
+       key_prefix="middleware",       # Prefix for rate limit keys
+       include_headers=True,          # Add rate limit headers to responses
+       error_message="Rate limit exceeded. Please try again later.",
+       status_code=429,               # HTTP status when limited
+       skip_on_error=False,           # Allow requests if backend fails
+       exempt_paths=None,             # Set of paths to exempt
+       exempt_ips=None,               # Set of IPs to exempt
+       key_extractor=default_key_extractor,  # Function to identify clients
+   )