Building a Distributed Rate Limiter

Rate limiting is a reliability feature, not a billing feature. Done well, it prevents cascading failure and keeps latency predictable under load.

The problem

You need a global budget (e.g. 2,000 req/s) enforced across many instances and regions, with burst tolerance and low coordination overhead.

Start with a token bucket per key. Tokens refill at a steady rate. Each request consumes 1 token.

type Bucket = { tokens: number; lastRefillMs: number }

type Bucket = { tokens: number; lastRefillMs: number }

For correctness, you eventually need some shared state. For speed, you want to avoid synchronous round trips on every request.

Two common strategies:

Centralized: Redis + Lua script (simple, consistent, single dependency)
Hierarchical: Local buckets + periodic reconciliation (faster, more complex)

If you can tolerate a single shared data plane, Redis is a great baseline. Atomically:

-- pseudocode
-- refill, then consume if possible

-- pseudocode
-- refill, then consume if possible

The difficult part isn’t the algorithm — it’s defining: