Performance =========== FastAPI Traffic is designed to be fast. But when you're handling thousands of requests per second, every microsecond counts. Here's how to squeeze out the best performance. Baseline Performance -------------------- On typical hardware, you can expect: - **Memory backend:** ~0.01ms per check - **SQLite backend:** ~0.1ms per check - **Redis backend:** ~1ms per check (network dependent) For most applications, this overhead is negligible compared to your actual business logic. Choosing the Right Algorithm ---------------------------- Algorithms have different performance characteristics: .. list-table:: :header-rows: 1 * - Algorithm - Time Complexity - Space Complexity - Notes * - Token Bucket - O(1) - O(1) - Two floats per key * - Fixed Window - O(1) - O(1) - One int + one float per key * - Sliding Window Counter - O(1) - O(1) - Three values per key * - Leaky Bucket - O(1) - O(1) - Two floats per key * - Sliding Window - O(n) - O(n) - Stores every timestamp **Recommendation:** Use Sliding Window Counter (the default) unless you have specific requirements. It's O(1) and provides good accuracy. **Avoid Sliding Window for high-volume endpoints.** If you're allowing 10,000 requests per minute, that's 10,000 timestamps to store and filter per key. Memory Backend Optimization --------------------------- The memory backend is already fast, but you can tune it: .. code-block:: python from fastapi_traffic import MemoryBackend backend = MemoryBackend( max_size=10000, # Limit memory usage cleanup_interval=60, # Less frequent cleanup = less overhead ) **max_size:** Limits the number of keys stored. When exceeded, LRU eviction kicks in. Set this based on your expected number of unique clients. **cleanup_interval:** How often to scan for expired entries. Higher values mean less CPU overhead but more memory usage from expired entries. SQLite Backend Optimization --------------------------- SQLite is surprisingly fast for rate limiting: .. code-block:: python from fastapi_traffic import SQLiteBackend backend = SQLiteBackend( "rate_limits.db", cleanup_interval=300, # Clean every 5 minutes ) **Tips:** 1. **Use an SSD.** SQLite performance depends heavily on disk I/O. 2. **Put the database on a local disk.** Network-attached storage adds latency. 3. **WAL mode is enabled by default.** This allows concurrent reads and writes. 4. **Increase cleanup_interval** if you have many keys. Cleanup scans the entire table. Redis Backend Optimization -------------------------- Redis is the bottleneck in most distributed setups: **1. Use connection pooling (automatic):** The backend maintains a pool of connections. You don't need to do anything. **2. Use pipelining for batch operations:** If you're checking multiple rate limits, batch them: .. code-block:: python # Instead of multiple round trips result1 = await limiter.check(request, config1) result2 = await limiter.check(request, config2) # Consider combining into one check with higher cost combined_config = RateLimitConfig(limit=100, window_size=60, cost=2) result = await limiter.check(request, combined_config) **3. Use Redis close to your application:** Network latency is usually the biggest factor. Run Redis in the same datacenter, or better yet, the same availability zone. **4. Consider Redis Cluster for high throughput:** Distributes load across multiple Redis nodes. Reducing Overhead ----------------- **1. Exempt paths that don't need limiting:** .. code-block:: python app.add_middleware( RateLimitMiddleware, limit=1000, window_size=60, exempt_paths={"/health", "/metrics", "/ready"}, ) **2. Use coarse-grained limits when possible:** Instead of limiting every endpoint separately, use middleware for a global limit: .. code-block:: python # One check per request app.add_middleware(RateLimitMiddleware, limit=1000, window_size=60) # vs. multiple checks per request @rate_limit(100, 60) # Check 1 @another_decorator # Check 2 async def endpoint(): pass **3. Increase window size:** Longer windows mean fewer state updates: .. code-block:: python # Updates state 60 times per minute per client @rate_limit(60, 60) # Updates state 1 time per minute per client @rate_limit(1, 1) # Same rate, but per-second Wait, that's backwards. Actually, the number of state updates equals the number of requests, regardless of window size. But longer windows mean: - Fewer unique window boundaries - Better cache efficiency - More stable rate limiting **4. Skip headers when not needed:** .. code-block:: python @rate_limit(100, 60, include_headers=False) Saves a tiny bit of response processing. Benchmarking ------------ Here's a simple benchmark script: .. code-block:: python import asyncio import time from fastapi_traffic import MemoryBackend, RateLimiter, RateLimitConfig from unittest.mock import MagicMock async def benchmark(): backend = MemoryBackend() limiter = RateLimiter(backend) await limiter.initialize() config = RateLimitConfig(limit=10000, window_size=60) # Mock request request = MagicMock() request.client.host = "127.0.0.1" request.url.path = "/test" request.method = "GET" request.headers = {} # Warm up for _ in range(100): await limiter.check(request, config) # Benchmark iterations = 10000 start = time.perf_counter() for _ in range(iterations): await limiter.check(request, config) elapsed = time.perf_counter() - start print(f"Total time: {elapsed:.3f}s") print(f"Per check: {elapsed/iterations*1000:.3f}ms") print(f"Checks/sec: {iterations/elapsed:.0f}") await limiter.close() asyncio.run(benchmark()) Typical output: .. code-block:: text Total time: 0.150s Per check: 0.015ms Checks/sec: 66666 Profiling --------- If you suspect rate limiting is a bottleneck, profile it: .. code-block:: python import cProfile import pstats async def profile_rate_limiting(): # Your rate limiting code here pass cProfile.run('asyncio.run(profile_rate_limiting())', 'rate_limit.prof') stats = pstats.Stats('rate_limit.prof') stats.sort_stats('cumulative') stats.print_stats(20) Look for: - Time spent in backend operations - Time spent in algorithm calculations - Unexpected hotspots When Performance Really Matters ------------------------------- If you're handling millions of requests per second and rate limiting overhead is significant: 1. **Consider sampling:** Only check rate limits for a percentage of requests and extrapolate. 2. **Use probabilistic data structures:** Bloom filters or Count-Min Sketch can approximate rate limiting with less overhead. 3. **Push to the edge:** Use CDN-level rate limiting (Cloudflare, AWS WAF) to handle the bulk of traffic. 4. **Accept some inaccuracy:** Fixed window with ``skip_on_error=True`` is very fast and "good enough" for many use cases. For most applications, though, the default configuration is plenty fast.