release: bump version to 0.3.0

- Refactor Redis backend connection handling and pool management - Update algorithm implementations with improved type annotations - Enhance config loader validation with stricter Pydantic schemas - Improve decorator and middleware error handling - Expand example scripts with better docstrings and usage patterns - Add new 00_basic_usage.py example for quick start - Reorganize examples directory structure - Fix type annotation inconsistencies across core modules - Update dependencies in pyproject.toml
2026-03-17 20:55:38 +00:00
parent 492410614f
commit f3453cb0fc
51 changed files with 6507 additions and 166 deletions
--- a/docs/advanced/performance.rst
+++ b/docs/advanced/performance.rst
@@ -0,0 +1,291 @@
+Performance
+===========
+
+FastAPI Traffic is designed to be fast. But when you're handling thousands of
+requests per second, every microsecond counts. Here's how to squeeze out the
+best performance.
+
+Baseline Performance
+--------------------
+
+On typical hardware, you can expect:
+
+- **Memory backend:** ~0.01ms per check
+- **SQLite backend:** ~0.1ms per check
+- **Redis backend:** ~1ms per check (network dependent)
+
+For most applications, this overhead is negligible compared to your actual
+business logic.
+
+Choosing the Right Algorithm
+----------------------------
+
+Algorithms have different performance characteristics:
+
+.. list-table::
+   :header-rows: 1
+
+   * - Algorithm
+     - Time Complexity
+     - Space Complexity
+     - Notes
+   * - Token Bucket
+     - O(1)
+     - O(1)
+     - Two floats per key
+   * - Fixed Window
+     - O(1)
+     - O(1)
+     - One int + one float per key
+   * - Sliding Window Counter
+     - O(1)
+     - O(1)
+     - Three values per key
+   * - Leaky Bucket
+     - O(1)
+     - O(1)
+     - Two floats per key
+   * - Sliding Window
+     - O(n)
+     - O(n)
+     - Stores every timestamp
+
+**Recommendation:** Use Sliding Window Counter (the default) unless you have
+specific requirements. It's O(1) and provides good accuracy.
+
+**Avoid Sliding Window for high-volume endpoints.** If you're allowing 10,000
+requests per minute, that's 10,000 timestamps to store and filter per key.
+
+Memory Backend Optimization
+---------------------------
+
+The memory backend is already fast, but you can tune it:
+
+.. code-block:: python
+
+   from fastapi_traffic import MemoryBackend
+
+   backend = MemoryBackend(
+       max_size=10000,       # Limit memory usage
+       cleanup_interval=60,  # Less frequent cleanup = less overhead
+   )
+
+**max_size:** Limits the number of keys stored. When exceeded, LRU eviction kicks
+in. Set this based on your expected number of unique clients.
+
+**cleanup_interval:** How often to scan for expired entries. Higher values mean
+less CPU overhead but more memory usage from expired entries.
+
+SQLite Backend Optimization
+---------------------------
+
+SQLite is surprisingly fast for rate limiting:
+
+.. code-block:: python
+
+   from fastapi_traffic import SQLiteBackend
+
+   backend = SQLiteBackend(
+       "rate_limits.db",
+       cleanup_interval=300,  # Clean every 5 minutes
+   )
+
+**Tips:**
+
+1. **Use an SSD.** SQLite performance depends heavily on disk I/O.
+
+2. **Put the database on a local disk.** Network-attached storage adds latency.
+
+3. **WAL mode is enabled by default.** This allows concurrent reads and writes.
+
+4. **Increase cleanup_interval** if you have many keys. Cleanup scans the entire
+   table.
+
+Redis Backend Optimization
+--------------------------
+
+Redis is the bottleneck in most distributed setups:
+
+**1. Use connection pooling (automatic):**
+
+The backend maintains a pool of connections. You don't need to do anything.
+
+**2. Use pipelining for batch operations:**
+
+If you're checking multiple rate limits, batch them:
+
+.. code-block:: python
+
+   # Instead of multiple round trips
+   result1 = await limiter.check(request, config1)
+   result2 = await limiter.check(request, config2)
+
+   # Consider combining into one check with higher cost
+   combined_config = RateLimitConfig(limit=100, window_size=60, cost=2)
+   result = await limiter.check(request, combined_config)
+
+**3. Use Redis close to your application:**
+
+Network latency is usually the biggest factor. Run Redis in the same datacenter,
+or better yet, the same availability zone.
+
+**4. Consider Redis Cluster for high throughput:**
+
+Distributes load across multiple Redis nodes.
+
+Reducing Overhead
+-----------------
+
+**1. Exempt paths that don't need limiting:**
+
+.. code-block:: python
+
+   app.add_middleware(
+       RateLimitMiddleware,
+       limit=1000,
+       window_size=60,
+       exempt_paths={"/health", "/metrics", "/ready"},
+   )
+
+**2. Use coarse-grained limits when possible:**
+
+Instead of limiting every endpoint separately, use middleware for a global limit:
+
+.. code-block:: python
+
+   # One check per request
+   app.add_middleware(RateLimitMiddleware, limit=1000, window_size=60)
+
+   # vs. multiple checks per request
+   @rate_limit(100, 60)  # Check 1
+   @another_decorator     # Check 2
+   async def endpoint():
+       pass
+
+**3. Increase window size:**
+
+Longer windows mean fewer state updates:
+
+.. code-block:: python
+
+   # Updates state 60 times per minute per client
+   @rate_limit(60, 60)
+
+   # Updates state 1 time per minute per client
+   @rate_limit(1, 1)  # Same rate, but per-second
+
+Wait, that's backwards. Actually, the number of state updates equals the number
+of requests, regardless of window size. But longer windows mean:
+
+- Fewer unique window boundaries
+- Better cache efficiency
+- More stable rate limiting
+
+**4. Skip headers when not needed:**
+
+.. code-block:: python
+
+   @rate_limit(100, 60, include_headers=False)
+
+Saves a tiny bit of response processing.
+
+Benchmarking
+------------
+
+Here's a simple benchmark script:
+
+.. code-block:: python
+
+   import asyncio
+   import time
+   from fastapi_traffic import MemoryBackend, RateLimiter, RateLimitConfig
+   from unittest.mock import MagicMock
+
+   async def benchmark():
+       backend = MemoryBackend()
+       limiter = RateLimiter(backend)
+       await limiter.initialize()
+
+       config = RateLimitConfig(limit=10000, window_size=60)
+       
+       # Mock request
+       request = MagicMock()
+       request.client.host = "127.0.0.1"
+       request.url.path = "/test"
+       request.method = "GET"
+       request.headers = {}
+
+       # Warm up
+       for _ in range(100):
+           await limiter.check(request, config)
+
+       # Benchmark
+       iterations = 10000
+       start = time.perf_counter()
+       
+       for _ in range(iterations):
+           await limiter.check(request, config)
+       
+       elapsed = time.perf_counter() - start
+       
+       print(f"Total time: {elapsed:.3f}s")
+       print(f"Per check: {elapsed/iterations*1000:.3f}ms")
+       print(f"Checks/sec: {iterations/elapsed:.0f}")
+
+       await limiter.close()
+
+   asyncio.run(benchmark())
+
+Typical output:
+
+.. code-block:: text
+
+   Total time: 0.150s
+   Per check: 0.015ms
+   Checks/sec: 66666
+
+Profiling
+---------
+
+If you suspect rate limiting is a bottleneck, profile it:
+
+.. code-block:: python
+
+   import cProfile
+   import pstats
+
+   async def profile_rate_limiting():
+       # Your rate limiting code here
+       pass
+
+   cProfile.run('asyncio.run(profile_rate_limiting())', 'rate_limit.prof')
+
+   stats = pstats.Stats('rate_limit.prof')
+   stats.sort_stats('cumulative')
+   stats.print_stats(20)
+
+Look for:
+
+- Time spent in backend operations
+- Time spent in algorithm calculations
+- Unexpected hotspots
+
+When Performance Really Matters
+-------------------------------
+
+If you're handling millions of requests per second and rate limiting overhead
+is significant:
+
+1. **Consider sampling:** Only check rate limits for a percentage of requests
+   and extrapolate.
+
+2. **Use probabilistic data structures:** Bloom filters or Count-Min Sketch can
+   approximate rate limiting with less overhead.
+
+3. **Push to the edge:** Use CDN-level rate limiting (Cloudflare, AWS WAF) to
+   handle the bulk of traffic.
+
+4. **Accept some inaccuracy:** Fixed window with ``skip_on_error=True`` is very
+   fast and "good enough" for many use cases.
+
+For most applications, though, the default configuration is plenty fast.