release: bump version to 0.3.0
- Refactor Redis backend connection handling and pool management - Update algorithm implementations with improved type annotations - Enhance config loader validation with stricter Pydantic schemas - Improve decorator and middleware error handling - Expand example scripts with better docstrings and usage patterns - Add new 00_basic_usage.py example for quick start - Reorganize examples directory structure - Fix type annotation inconsistencies across core modules - Update dependencies in pyproject.toml
This commit is contained in:
291
docs/advanced/performance.rst
Normal file
291
docs/advanced/performance.rst
Normal file
@@ -0,0 +1,291 @@
|
||||
Performance
|
||||
===========
|
||||
|
||||
FastAPI Traffic is designed to be fast. But when you're handling thousands of
|
||||
requests per second, every microsecond counts. Here's how to squeeze out the
|
||||
best performance.
|
||||
|
||||
Baseline Performance
|
||||
--------------------
|
||||
|
||||
On typical hardware, you can expect:
|
||||
|
||||
- **Memory backend:** ~0.01ms per check
|
||||
- **SQLite backend:** ~0.1ms per check
|
||||
- **Redis backend:** ~1ms per check (network dependent)
|
||||
|
||||
For most applications, this overhead is negligible compared to your actual
|
||||
business logic.
|
||||
|
||||
Choosing the Right Algorithm
|
||||
----------------------------
|
||||
|
||||
Algorithms have different performance characteristics:
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - Algorithm
|
||||
- Time Complexity
|
||||
- Space Complexity
|
||||
- Notes
|
||||
* - Token Bucket
|
||||
- O(1)
|
||||
- O(1)
|
||||
- Two floats per key
|
||||
* - Fixed Window
|
||||
- O(1)
|
||||
- O(1)
|
||||
- One int + one float per key
|
||||
* - Sliding Window Counter
|
||||
- O(1)
|
||||
- O(1)
|
||||
- Three values per key
|
||||
* - Leaky Bucket
|
||||
- O(1)
|
||||
- O(1)
|
||||
- Two floats per key
|
||||
* - Sliding Window
|
||||
- O(n)
|
||||
- O(n)
|
||||
- Stores every timestamp
|
||||
|
||||
**Recommendation:** Use Sliding Window Counter (the default) unless you have
|
||||
specific requirements. It's O(1) and provides good accuracy.
|
||||
|
||||
**Avoid Sliding Window for high-volume endpoints.** If you're allowing 10,000
|
||||
requests per minute, that's 10,000 timestamps to store and filter per key.
|
||||
|
||||
Memory Backend Optimization
|
||||
---------------------------
|
||||
|
||||
The memory backend is already fast, but you can tune it:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import MemoryBackend
|
||||
|
||||
backend = MemoryBackend(
|
||||
max_size=10000, # Limit memory usage
|
||||
cleanup_interval=60, # Less frequent cleanup = less overhead
|
||||
)
|
||||
|
||||
**max_size:** Limits the number of keys stored. When exceeded, LRU eviction kicks
|
||||
in. Set this based on your expected number of unique clients.
|
||||
|
||||
**cleanup_interval:** How often to scan for expired entries. Higher values mean
|
||||
less CPU overhead but more memory usage from expired entries.
|
||||
|
||||
SQLite Backend Optimization
|
||||
---------------------------
|
||||
|
||||
SQLite is surprisingly fast for rate limiting:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastapi_traffic import SQLiteBackend
|
||||
|
||||
backend = SQLiteBackend(
|
||||
"rate_limits.db",
|
||||
cleanup_interval=300, # Clean every 5 minutes
|
||||
)
|
||||
|
||||
**Tips:**
|
||||
|
||||
1. **Use an SSD.** SQLite performance depends heavily on disk I/O.
|
||||
|
||||
2. **Put the database on a local disk.** Network-attached storage adds latency.
|
||||
|
||||
3. **WAL mode is enabled by default.** This allows concurrent reads and writes.
|
||||
|
||||
4. **Increase cleanup_interval** if you have many keys. Cleanup scans the entire
|
||||
table.
|
||||
|
||||
Redis Backend Optimization
|
||||
--------------------------
|
||||
|
||||
Redis is the bottleneck in most distributed setups:
|
||||
|
||||
**1. Use connection pooling (automatic):**
|
||||
|
||||
The backend maintains a pool of connections. You don't need to do anything.
|
||||
|
||||
**2. Use pipelining for batch operations:**
|
||||
|
||||
If you're checking multiple rate limits, batch them:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Instead of multiple round trips
|
||||
result1 = await limiter.check(request, config1)
|
||||
result2 = await limiter.check(request, config2)
|
||||
|
||||
# Consider combining into one check with higher cost
|
||||
combined_config = RateLimitConfig(limit=100, window_size=60, cost=2)
|
||||
result = await limiter.check(request, combined_config)
|
||||
|
||||
**3. Use Redis close to your application:**
|
||||
|
||||
Network latency is usually the biggest factor. Run Redis in the same datacenter,
|
||||
or better yet, the same availability zone.
|
||||
|
||||
**4. Consider Redis Cluster for high throughput:**
|
||||
|
||||
Distributes load across multiple Redis nodes.
|
||||
|
||||
Reducing Overhead
|
||||
-----------------
|
||||
|
||||
**1. Exempt paths that don't need limiting:**
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
app.add_middleware(
|
||||
RateLimitMiddleware,
|
||||
limit=1000,
|
||||
window_size=60,
|
||||
exempt_paths={"/health", "/metrics", "/ready"},
|
||||
)
|
||||
|
||||
**2. Use coarse-grained limits when possible:**
|
||||
|
||||
Instead of limiting every endpoint separately, use middleware for a global limit:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# One check per request
|
||||
app.add_middleware(RateLimitMiddleware, limit=1000, window_size=60)
|
||||
|
||||
# vs. multiple checks per request
|
||||
@rate_limit(100, 60) # Check 1
|
||||
@another_decorator # Check 2
|
||||
async def endpoint():
|
||||
pass
|
||||
|
||||
**3. Increase window size:**
|
||||
|
||||
Longer windows mean fewer state updates:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Updates state 60 times per minute per client
|
||||
@rate_limit(60, 60)
|
||||
|
||||
# Updates state 1 time per minute per client
|
||||
@rate_limit(1, 1) # Same rate, but per-second
|
||||
|
||||
Wait, that's backwards. Actually, the number of state updates equals the number
|
||||
of requests, regardless of window size. But longer windows mean:
|
||||
|
||||
- Fewer unique window boundaries
|
||||
- Better cache efficiency
|
||||
- More stable rate limiting
|
||||
|
||||
**4. Skip headers when not needed:**
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@rate_limit(100, 60, include_headers=False)
|
||||
|
||||
Saves a tiny bit of response processing.
|
||||
|
||||
Benchmarking
|
||||
------------
|
||||
|
||||
Here's a simple benchmark script:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import asyncio
|
||||
import time
|
||||
from fastapi_traffic import MemoryBackend, RateLimiter, RateLimitConfig
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
async def benchmark():
|
||||
backend = MemoryBackend()
|
||||
limiter = RateLimiter(backend)
|
||||
await limiter.initialize()
|
||||
|
||||
config = RateLimitConfig(limit=10000, window_size=60)
|
||||
|
||||
# Mock request
|
||||
request = MagicMock()
|
||||
request.client.host = "127.0.0.1"
|
||||
request.url.path = "/test"
|
||||
request.method = "GET"
|
||||
request.headers = {}
|
||||
|
||||
# Warm up
|
||||
for _ in range(100):
|
||||
await limiter.check(request, config)
|
||||
|
||||
# Benchmark
|
||||
iterations = 10000
|
||||
start = time.perf_counter()
|
||||
|
||||
for _ in range(iterations):
|
||||
await limiter.check(request, config)
|
||||
|
||||
elapsed = time.perf_counter() - start
|
||||
|
||||
print(f"Total time: {elapsed:.3f}s")
|
||||
print(f"Per check: {elapsed/iterations*1000:.3f}ms")
|
||||
print(f"Checks/sec: {iterations/elapsed:.0f}")
|
||||
|
||||
await limiter.close()
|
||||
|
||||
asyncio.run(benchmark())
|
||||
|
||||
Typical output:
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
Total time: 0.150s
|
||||
Per check: 0.015ms
|
||||
Checks/sec: 66666
|
||||
|
||||
Profiling
|
||||
---------
|
||||
|
||||
If you suspect rate limiting is a bottleneck, profile it:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import cProfile
|
||||
import pstats
|
||||
|
||||
async def profile_rate_limiting():
|
||||
# Your rate limiting code here
|
||||
pass
|
||||
|
||||
cProfile.run('asyncio.run(profile_rate_limiting())', 'rate_limit.prof')
|
||||
|
||||
stats = pstats.Stats('rate_limit.prof')
|
||||
stats.sort_stats('cumulative')
|
||||
stats.print_stats(20)
|
||||
|
||||
Look for:
|
||||
|
||||
- Time spent in backend operations
|
||||
- Time spent in algorithm calculations
|
||||
- Unexpected hotspots
|
||||
|
||||
When Performance Really Matters
|
||||
-------------------------------
|
||||
|
||||
If you're handling millions of requests per second and rate limiting overhead
|
||||
is significant:
|
||||
|
||||
1. **Consider sampling:** Only check rate limits for a percentage of requests
|
||||
and extrapolate.
|
||||
|
||||
2. **Use probabilistic data structures:** Bloom filters or Count-Min Sketch can
|
||||
approximate rate limiting with less overhead.
|
||||
|
||||
3. **Push to the edge:** Use CDN-level rate limiting (Cloudflare, AWS WAF) to
|
||||
handle the bulk of traffic.
|
||||
|
||||
4. **Accept some inaccuracy:** Fixed window with ``skip_on_error=True`` is very
|
||||
fast and "good enough" for many use cases.
|
||||
|
||||
For most applications, though, the default configuration is plenty fast.
|
||||
Reference in New Issue
Block a user