release: bump version to 0.3.0

- Refactor Redis backend connection handling and pool management
- Update algorithm implementations with improved type annotations
- Enhance config loader validation with stricter Pydantic schemas
- Improve decorator and middleware error handling
- Expand example scripts with better docstrings and usage patterns
- Add new 00_basic_usage.py example for quick start
- Reorganize examples directory structure
- Fix type annotation inconsistencies across core modules
- Update dependencies in pyproject.toml
This commit is contained in:
2026-03-17 20:55:38 +00:00
parent 492410614f
commit f3453cb0fc
51 changed files with 6507 additions and 166 deletions

View File

@@ -0,0 +1,290 @@
Rate Limiting Algorithms
========================
FastAPI Traffic ships with five rate limiting algorithms. Each has its own strengths,
and picking the right one depends on what you're trying to achieve.
This guide will help you understand the tradeoffs and choose wisely.
Overview
--------
Here's the quick comparison:
.. list-table::
:header-rows: 1
:widths: 20 40 40
* - Algorithm
- Best For
- Tradeoffs
* - **Token Bucket**
- APIs that need burst handling
- Allows temporary spikes above average rate
* - **Sliding Window**
- Precise rate limiting
- Higher memory usage
* - **Fixed Window**
- Simple, low-overhead limiting
- Boundary issues (2x burst at window edges)
* - **Leaky Bucket**
- Consistent throughput
- No burst handling
* - **Sliding Window Counter**
- General purpose (default)
- Good balance of precision and efficiency
Token Bucket
------------
Think of this as a bucket that holds tokens. Each request consumes a token, and
tokens refill at a steady rate. If the bucket is empty, requests are rejected.
.. code-block:: python
from fastapi_traffic import rate_limit, Algorithm
@app.get("/api/data")
@rate_limit(
100, # 100 tokens refill per minute
60,
algorithm=Algorithm.TOKEN_BUCKET,
burst_size=20, # bucket can hold up to 20 tokens
)
async def get_data(request: Request):
return {"data": "here"}
**How it works:**
1. The bucket starts full (at ``burst_size`` capacity)
2. Each request removes one token
3. Tokens refill at ``limit / window_size`` per second
4. If no tokens are available, the request is rejected
**When to use it:**
- Your API has legitimate burst traffic (e.g., page loads that trigger multiple requests)
- You want to allow short spikes while maintaining an average rate
- Mobile apps that batch requests when coming online
**Example scenario:** A mobile app that syncs data when it reconnects. You want to
allow it to catch up quickly, but not overwhelm your servers.
Sliding Window
--------------
This algorithm tracks the exact timestamp of every request within the window. It's
the most accurate approach, but uses more memory.
.. code-block:: python
@app.get("/api/transactions")
@rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW)
async def get_transactions(request: Request):
return {"transactions": []}
**How it works:**
1. Every request timestamp is stored
2. When checking, we count requests in the last ``window_size`` seconds
3. Old timestamps are cleaned up automatically
**When to use it:**
- You need precise rate limiting (financial APIs, compliance requirements)
- Memory isn't a major concern
- The rate limit is relatively low (not millions of requests)
**Tradeoffs:**
- Memory usage grows with request volume
- Slightly more CPU for timestamp management
Fixed Window
------------
The simplest algorithm. Divide time into fixed windows (e.g., every minute) and
count requests in each window.
.. code-block:: python
@app.get("/api/simple")
@rate_limit(100, 60, algorithm=Algorithm.FIXED_WINDOW)
async def simple_endpoint(request: Request):
return {"status": "ok"}
**How it works:**
1. Time is divided into fixed windows (0:00-1:00, 1:00-2:00, etc.)
2. Each request increments the counter for the current window
3. When the window changes, the counter resets
**When to use it:**
- You want the simplest, most efficient option
- Slight inaccuracy at window boundaries is acceptable
- High-volume scenarios where memory matters
**The boundary problem:**
A client could make 100 requests at 0:59 and another 100 at 1:01, effectively
getting 200 requests in 2 seconds. If this matters for your use case, use
sliding window counter instead.
Leaky Bucket
------------
Imagine a bucket with a hole in the bottom. Requests fill the bucket, and it
"leaks" at a constant rate. If the bucket overflows, requests are rejected.
.. code-block:: python
@app.get("/api/steady")
@rate_limit(
100,
60,
algorithm=Algorithm.LEAKY_BUCKET,
burst_size=10, # bucket capacity
)
async def steady_endpoint(request: Request):
return {"status": "ok"}
**How it works:**
1. The bucket has a maximum capacity (``burst_size``)
2. Each request adds "water" to the bucket
3. Water leaks out at ``limit / window_size`` per second
4. If the bucket would overflow, the request is rejected
**When to use it:**
- You need consistent, smooth throughput
- Downstream systems can't handle bursts
- Processing capacity is truly fixed (e.g., hardware limitations)
**Difference from token bucket:**
- Token bucket allows bursts up to the bucket size
- Leaky bucket smooths out traffic to a constant rate
Sliding Window Counter
----------------------
This is the default algorithm, and it's a good choice for most use cases. It
combines the efficiency of fixed windows with better accuracy.
.. code-block:: python
@app.get("/api/default")
@rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW_COUNTER)
async def default_endpoint(request: Request):
return {"status": "ok"}
**How it works:**
1. Maintains counters for the current and previous windows
2. Calculates a weighted average based on how far into the current window we are
3. At 30 seconds into a 60-second window: ``count = prev_count * 0.5 + curr_count``
**When to use it:**
- General purpose rate limiting
- You want better accuracy than fixed window without the memory cost of sliding window
- Most APIs fall into this category
**Why it's the default:**
It gives you 90% of the accuracy of sliding window with the memory efficiency of
fixed window. Unless you have specific requirements, this is probably what you want.
Choosing the Right Algorithm
----------------------------
Here's a decision tree:
1. **Do you need to allow bursts?**
- Yes → Token Bucket
- No, I need smooth traffic → Leaky Bucket
2. **Do you need exact precision?**
- Yes, compliance/financial → Sliding Window
- No, good enough is fine → Continue
3. **Is memory a concern?**
- Yes, high volume → Fixed Window
- No → Sliding Window Counter (default)
Performance Comparison
----------------------
All algorithms are O(1) for the check operation, but they differ in storage:
.. list-table::
:header-rows: 1
* - Algorithm
- Storage per Key
- Operations
* - Token Bucket
- 2 floats
- 1 read, 1 write
* - Sliding Window
- N timestamps
- 1 read, 1 write, cleanup
* - Fixed Window
- 1 int, 1 float
- 1 read, 1 write
* - Leaky Bucket
- 2 floats
- 1 read, 1 write
* - Sliding Window Counter
- 3 values
- 1 read, 1 write
For most applications, the performance difference is negligible. Choose based on
behavior, not performance, unless you're handling millions of requests per second.
Code Examples
-------------
Here's a complete example showing all algorithms:
.. code-block:: python
from fastapi import FastAPI, Request
from fastapi_traffic import rate_limit, Algorithm
app = FastAPI()
# Burst-friendly endpoint
@app.get("/api/burst")
@rate_limit(100, 60, algorithm=Algorithm.TOKEN_BUCKET, burst_size=25)
async def burst_endpoint(request: Request):
return {"type": "token_bucket"}
# Precise limiting
@app.get("/api/precise")
@rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW)
async def precise_endpoint(request: Request):
return {"type": "sliding_window"}
# Simple and efficient
@app.get("/api/simple")
@rate_limit(100, 60, algorithm=Algorithm.FIXED_WINDOW)
async def simple_endpoint(request: Request):
return {"type": "fixed_window"}
# Smooth throughput
@app.get("/api/steady")
@rate_limit(100, 60, algorithm=Algorithm.LEAKY_BUCKET)
async def steady_endpoint(request: Request):
return {"type": "leaky_bucket"}
# Best of both worlds (default)
@app.get("/api/balanced")
@rate_limit(100, 60, algorithm=Algorithm.SLIDING_WINDOW_COUNTER)
async def balanced_endpoint(request: Request):
return {"type": "sliding_window_counter"}