fastapi-traffic/docs/advanced/performance.rst

Performance
===========

FastAPI Traffic is designed to be fast. But when you're handling thousands of
requests per second, every microsecond counts. Here's how to squeeze out the
best performance.

Baseline Performance
--------------------

On typical hardware, you can expect:

- **Memory backend:** ~0.01ms per check
- **SQLite backend:** ~0.1ms per check
- **Redis backend:** ~1ms per check (network dependent)

For most applications, this overhead is negligible compared to your actual
business logic.

Choosing the Right Algorithm
----------------------------

Algorithms have different performance characteristics:

.. list-table::
   :header-rows: 1

   * - Algorithm
     - Time Complexity
     - Space Complexity
     - Notes
   * - Token Bucket
     - O(1)
     - O(1)
     - Two floats per key
   * - Fixed Window
     - O(1)
     - O(1)
     - One int + one float per key
   * - Sliding Window Counter
     - O(1)
     - O(1)
     - Three values per key
   * - Leaky Bucket
     - O(1)
     - O(1)
     - Two floats per key
   * - Sliding Window
     - O(n)
     - O(n)
     - Stores every timestamp

**Recommendation:** Use Sliding Window Counter (the default) unless you have
specific requirements. It's O(1) and provides good accuracy.

**Avoid Sliding Window for high-volume endpoints.** If you're allowing 10,000
requests per minute, that's 10,000 timestamps to store and filter per key.

Memory Backend Optimization
---------------------------

The memory backend is already fast, but you can tune it:

.. code-block:: python

   from fastapi_traffic import MemoryBackend

   backend = MemoryBackend(
       max_size=10000,       # Limit memory usage
       cleanup_interval=60,  # Less frequent cleanup = less overhead
   )

**max_size:** Limits the number of keys stored. When exceeded, LRU eviction kicks
in. Set this based on your expected number of unique clients.

**cleanup_interval:** How often to scan for expired entries. Higher values mean
less CPU overhead but more memory usage from expired entries.

SQLite Backend Optimization
---------------------------

SQLite is surprisingly fast for rate limiting:

.. code-block:: python

   from fastapi_traffic import SQLiteBackend

   backend = SQLiteBackend(
       "rate_limits.db",
       cleanup_interval=300,  # Clean every 5 minutes
   )

**Tips:**

1. **Use an SSD.** SQLite performance depends heavily on disk I/O.

2. **Put the database on a local disk.** Network-attached storage adds latency.

3. **WAL mode is enabled by default.** This allows concurrent reads and writes.

4. **Increase cleanup_interval** if you have many keys. Cleanup scans the entire
   table.

Redis Backend Optimization
--------------------------

Redis is the bottleneck in most distributed setups:

**1. Use connection pooling (automatic):**

The backend maintains a pool of connections. You don't need to do anything.

**2. Use pipelining for batch operations:**

If you're checking multiple rate limits, batch them:

.. code-block:: python

   # Instead of multiple round trips
   result1 = await limiter.check(request, config1)
   result2 = await limiter.check(request, config2)

   # Consider combining into one check with higher cost
   combined_config = RateLimitConfig(limit=100, window_size=60, cost=2)
   result = await limiter.check(request, combined_config)

**3. Use Redis close to your application:**

Network latency is usually the biggest factor. Run Redis in the same datacenter,
or better yet, the same availability zone.

**4. Consider Redis Cluster for high throughput:**

Distributes load across multiple Redis nodes.

Reducing Overhead
-----------------

**1. Exempt paths that don't need limiting:**

.. code-block:: python

   app.add_middleware(
       RateLimitMiddleware,
       limit=1000,
       window_size=60,
       exempt_paths={"/health", "/metrics", "/ready"},
   )

**2. Use coarse-grained limits when possible:**

Instead of limiting every endpoint separately, use middleware for a global limit:

.. code-block:: python

   # One check per request
   app.add_middleware(RateLimitMiddleware, limit=1000, window_size=60)

   # vs. multiple checks per request
   @rate_limit(100, 60)  # Check 1
   @another_decorator     # Check 2
   async def endpoint():
       pass

**3. Increase window size:**

Longer windows mean fewer state updates:

.. code-block:: python

   # Updates state 60 times per minute per client
   @rate_limit(60, 60)

   # Updates state 1 time per minute per client
   @rate_limit(1, 1)  # Same rate, but per-second

Wait, that's backwards. Actually, the number of state updates equals the number
of requests, regardless of window size. But longer windows mean:

- Fewer unique window boundaries
- Better cache efficiency
- More stable rate limiting

**4. Skip headers when not needed:**

.. code-block:: python

   @rate_limit(100, 60, include_headers=False)

Saves a tiny bit of response processing.

Benchmarking
------------

Here's a simple benchmark script:

.. code-block:: python

   import asyncio
   import time
   from fastapi_traffic import MemoryBackend, RateLimiter, RateLimitConfig
   from unittest.mock import MagicMock

   async def benchmark():
       backend = MemoryBackend()
       limiter = RateLimiter(backend)
       await limiter.initialize()

       config = RateLimitConfig(limit=10000, window_size=60)

       # Mock request
       request = MagicMock()
       request.client.host = "127.0.0.1"
       request.url.path = "/test"
       request.method = "GET"
       request.headers = {}

       # Warm up
       for _ in range(100):
           await limiter.check(request, config)

       # Benchmark
       iterations = 10000
       start = time.perf_counter()

       for _ in range(iterations):
           await limiter.check(request, config)

       elapsed = time.perf_counter() - start

       print(f"Total time: {elapsed:.3f}s")
       print(f"Per check: {elapsed/iterations*1000:.3f}ms")
       print(f"Checks/sec: {iterations/elapsed:.0f}")

       await limiter.close()

   asyncio.run(benchmark())

Typical output:

.. code-block:: text

   Total time: 0.150s
   Per check: 0.015ms
   Checks/sec: 66666

Profiling
---------

If you suspect rate limiting is a bottleneck, profile it:

.. code-block:: python

   import cProfile
   import pstats

   async def profile_rate_limiting():
       # Your rate limiting code here
       pass

   cProfile.run('asyncio.run(profile_rate_limiting())', 'rate_limit.prof')

   stats = pstats.Stats('rate_limit.prof')
   stats.sort_stats('cumulative')
   stats.print_stats(20)

Look for:

- Time spent in backend operations
- Time spent in algorithm calculations
- Unexpected hotspots

When Performance Really Matters
-------------------------------

If you're handling millions of requests per second and rate limiting overhead
is significant:

1. **Consider sampling:** Only check rate limits for a percentage of requests
   and extrapolate.

2. **Use probabilistic data structures:** Bloom filters or Count-Min Sketch can
   approximate rate limiting with less overhead.

3. **Push to the edge:** Use CDN-level rate limiting (Cloudflare, AWS WAF) to
   handle the bulk of traffic.

4. **Accept some inaccuracy:** Fixed window with ``skip_on_error=True`` is very
   fast and "good enough" for many use cases.

For most applications, though, the default configuration is plenty fast.