Rate Limiting Algorithms ======================== FastAPI Traffic ships with five rate limiting algorithms. Each has its own strengths, and picking the right one depends on what you're trying to achieve. This guide will help you understand the tradeoffs and choose wisely. Overview -------- Here's the quick comparison: .. list-table:: :header-rows: 1 :widths: 20 40 40 * - Algorithm - Best For - Tradeoffs * - **Token Bucket** - APIs that need burst handling - Allows temporary spikes above average rate * - **Sliding Window** - Precise rate limiting - Higher memory usage * - **Fixed Window** - Simple, low-overhead limiting - Boundary issues (2x burst at window edges) * - **Leaky Bucket** - Consistent throughput - No burst handling * - **Sliding Window Counter** - General purpose (default) - Good balance of precision and efficiency Token Bucket ------------ Think of this as a bucket that holds tokens. Each request consumes a token, and tokens refill at a steady rate. If the bucket is empty, requests are rejected. .. code-block:: python from fastapi_traffic import rate_limit, Algorithm @app.get("/api/data") @rate_limit( 100, # 100 tokens refill per minute 60, algorithm=Algorithm.TOKEN_BUCKET, burst_size=20, # bucket can hold up to 20 tokens ) async def get_data(request: Request): return {"data": "here"} **How it works:** 1. The bucket starts full (at ``burst_size`` capacity) 2. Each request removes one token 3. Tokens refill at ``limit / window_size`` per second 4. If no tokens are available, the request is rejected **When to use it:** - Your API has legitimate burst traffic (e.g., page loads that trigger multiple requests) - You want to allow short spikes while maintaining an average rate - Mobile apps that batch requests when coming online **Example scenario:** A mobile app that syncs data when it reconnects. You want to allow it to catch up quickly, but not overwhelm your servers. Sliding Window -------------- This algorithm tracks the exact timestamp of every request within the window. It's the most accurate approach, but uses more memory. .. code-block:: python @app.get("/api/transactions") @rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW) async def get_transactions(request: Request): return {"transactions": []} **How it works:** 1. Every request timestamp is stored 2. When checking, we count requests in the last ``window_size`` seconds 3. Old timestamps are cleaned up automatically **When to use it:** - You need precise rate limiting (financial APIs, compliance requirements) - Memory isn't a major concern - The rate limit is relatively low (not millions of requests) **Tradeoffs:** - Memory usage grows with request volume - Slightly more CPU for timestamp management Fixed Window ------------ The simplest algorithm. Divide time into fixed windows (e.g., every minute) and count requests in each window. .. code-block:: python @app.get("/api/simple") @rate_limit(100, 60, algorithm=Algorithm.FIXED_WINDOW) async def simple_endpoint(request: Request): return {"status": "ok"} **How it works:** 1. Time is divided into fixed windows (0:00-1:00, 1:00-2:00, etc.) 2. Each request increments the counter for the current window 3. When the window changes, the counter resets **When to use it:** - You want the simplest, most efficient option - Slight inaccuracy at window boundaries is acceptable - High-volume scenarios where memory matters **The boundary problem:** A client could make 100 requests at 0:59 and another 100 at 1:01, effectively getting 200 requests in 2 seconds. If this matters for your use case, use sliding window counter instead. Leaky Bucket ------------ Imagine a bucket with a hole in the bottom. Requests fill the bucket, and it "leaks" at a constant rate. If the bucket overflows, requests are rejected. .. code-block:: python @app.get("/api/steady") @rate_limit( 100, 60, algorithm=Algorithm.LEAKY_BUCKET, burst_size=10, # bucket capacity ) async def steady_endpoint(request: Request): return {"status": "ok"} **How it works:** 1. The bucket has a maximum capacity (``burst_size``) 2. Each request adds "water" to the bucket 3. Water leaks out at ``limit / window_size`` per second 4. If the bucket would overflow, the request is rejected **When to use it:** - You need consistent, smooth throughput - Downstream systems can't handle bursts - Processing capacity is truly fixed (e.g., hardware limitations) **Difference from token bucket:** - Token bucket allows bursts up to the bucket size - Leaky bucket smooths out traffic to a constant rate Sliding Window Counter ---------------------- This is the default algorithm, and it's a good choice for most use cases. It combines the efficiency of fixed windows with better accuracy. .. code-block:: python @app.get("/api/default") @rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW_COUNTER) async def default_endpoint(request: Request): return {"status": "ok"} **How it works:** 1. Maintains counters for the current and previous windows 2. Calculates a weighted average based on how far into the current window we are 3. At 30 seconds into a 60-second window: ``count = prev_count * 0.5 + curr_count`` **When to use it:** - General purpose rate limiting - You want better accuracy than fixed window without the memory cost of sliding window - Most APIs fall into this category **Why it's the default:** It gives you 90% of the accuracy of sliding window with the memory efficiency of fixed window. Unless you have specific requirements, this is probably what you want. Choosing the Right Algorithm ---------------------------- Here's a decision tree: 1. **Do you need to allow bursts?** - Yes → Token Bucket - No, I need smooth traffic → Leaky Bucket 2. **Do you need exact precision?** - Yes, compliance/financial → Sliding Window - No, good enough is fine → Continue 3. **Is memory a concern?** - Yes, high volume → Fixed Window - No → Sliding Window Counter (default) Performance Comparison ---------------------- All algorithms are O(1) for the check operation, but they differ in storage: .. list-table:: :header-rows: 1 * - Algorithm - Storage per Key - Operations * - Token Bucket - 2 floats - 1 read, 1 write * - Sliding Window - N timestamps - 1 read, 1 write, cleanup * - Fixed Window - 1 int, 1 float - 1 read, 1 write * - Leaky Bucket - 2 floats - 1 read, 1 write * - Sliding Window Counter - 3 values - 1 read, 1 write For most applications, the performance difference is negligible. Choose based on behavior, not performance, unless you're handling millions of requests per second. Code Examples ------------- Here's a complete example showing all algorithms: .. code-block:: python from fastapi import FastAPI, Request from fastapi_traffic import rate_limit, Algorithm app = FastAPI() # Burst-friendly endpoint @app.get("/api/burst") @rate_limit(100, 60, algorithm=Algorithm.TOKEN_BUCKET, burst_size=25) async def burst_endpoint(request: Request): return {"type": "token_bucket"} # Precise limiting @app.get("/api/precise") @rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW) async def precise_endpoint(request: Request): return {"type": "sliding_window"} # Simple and efficient @app.get("/api/simple") @rate_limit(100, 60, algorithm=Algorithm.FIXED_WINDOW) async def simple_endpoint(request: Request): return {"type": "fixed_window"} # Smooth throughput @app.get("/api/steady") @rate_limit(100, 60, algorithm=Algorithm.LEAKY_BUCKET) async def steady_endpoint(request: Request): return {"type": "leaky_bucket"} # Best of both worlds (default) @app.get("/api/balanced") @rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW_COUNTER) async def balanced_endpoint(request: Request): return {"type": "sliding_window_counter"}