Unique ID Generator: Designing IDs for Distributed Systems
Every distributed system needs a way to identify records uniquely across machines, datacenters, and time zones. A naive auto-incrementing integer breaks the moment you add a second database node. Designing a robust unique ID generator is a classic system design problem — and the decisions you make affect ordering, indexing, latency, and debuggability.
Requirements#
Before choosing an approach, clarify these requirements:
- Global uniqueness — No two IDs collide, even across datacenters.
- Rough time ordering — IDs generated later should sort after IDs generated earlier (important for database indexing and pagination).
- Low latency — ID generation must not become a bottleneck; sub-millisecond is ideal.
- High availability — No single point of failure.
- Compactness — Shorter IDs reduce storage and network overhead.
Not every system needs all five. A logging pipeline may care only about uniqueness, while a social feed needs strict time ordering.
Database Auto-Increment: Why It Breaks#
The simplest approach — AUTO_INCREMENT in MySQL or SERIAL in PostgreSQL — works on a single node:
INSERT INTO orders (id, ...) VALUES (DEFAULT, ...);
-- id = 1, 2, 3, 4, ...
Problems in a distributed setting:
- Single point of failure — One database generates all IDs.
- Coordination overhead — Multi-master setups (odd/even IDs) waste half the ID space and break ordering.
- Latency — Every ID requires a round-trip to the database.
- Scaling ceiling — The ID-generating node becomes a bottleneck under high write load.
Auto-increment is fine for small, single-region applications. Beyond that, you need a distributed scheme.
UUID v4: Random but Unordered#
A UUID v4 is 128 bits of randomness formatted as 550e8400-e29b-41d4-a716-446655440000. Any node can generate one without coordination.
Pros:
- Zero coordination — generate locally, instantly.
- Collision probability is astronomically low (2^122 random bits).
Cons:
- No time ordering — UUIDs are random, so B-tree indexes scatter inserts across pages, causing write amplification.
- 36 characters as a string — bulky in URLs and logs.
- Not human-readable or debuggable.
UUID v4 is a safe default when ordering does not matter and you can tolerate the storage overhead.
UUID v7: Time-Ordered UUIDs#
RFC 9562 (2024) introduced UUID v7, which embeds a Unix timestamp in the most significant bits:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
├─────────────────── unix_ts_ms (48 bits) ──────────────────────────┤
├─ ver ─┤──── rand_a (12 bits) ────┤─var─┤──── rand_b (62 bits) ───┤
IDs sort chronologically because the timestamp occupies the high-order bits. This gives you B-tree-friendly inserts while retaining the no-coordination property of UUIDs.
Trade-off: Still 128 bits (36 characters as a string). If compactness matters, consider ULID or Snowflake.
Twitter Snowflake ID#
Twitter's Snowflake (2010) is the most widely referenced distributed ID scheme in system design interviews. A Snowflake ID is a 64-bit integer:
┌──────────────────────────────────────────────────────────────────┐
│ 0 │ 41 bits: timestamp (ms) │ 5 │ 5 │ 12 │
│sign │ (milliseconds since custom epoch) │ DC│WK │ seq │
└──────────────────────────────────────────────────────────────────┘
- 1 bit: sign (always 0)
- 41 bits: millisecond timestamp → ~69 years from epoch
- 5 bits: datacenter ID → 32 datacenters
- 5 bits: worker ID → 32 workers per datacenter
- 12 bits: sequence number → 4096 IDs per millisecond per worker
Properties#
- Time-ordered — The timestamp in the high bits means IDs sort chronologically.
- Compact — 64 bits fits in a database
BIGINTand is half the size of a UUID. - High throughput — Each worker generates up to 4096 IDs per millisecond (4M per second) with no coordination.
- Embedded metadata — You can extract the timestamp, datacenter, and worker from the ID itself.
Limitations#
- Requires pre-assigned datacenter and worker IDs (typically via ZooKeeper or configuration).
- 41-bit timestamp overflows after ~69 years — choose the epoch carefully.
- Clock skew can produce duplicate or out-of-order IDs (see below).
ULID: Lexicographically Sortable#
A ULID (Universally Unique Lexicographically Sortable Identifier) is 128 bits encoded as a 26-character Crockford Base32 string:
01ARZ3NDEKTSV4RRFFQ69G5FAV
└──────┘└────────────────┘
timestamp randomness
(48 bits) (80 bits)
Advantages over UUID v7:
- Shorter string representation (26 vs. 36 characters).
- Case-insensitive, no hyphens — URL and filename friendly.
- Monotonic sort order within the same millisecond (implementations increment the random component).
Disadvantage: Not a standard UUID, so some databases and libraries expect UUID format.
Comparison Table#
| Scheme | Bits | Ordered | Coordination | String Length | Fits BIGINT |
|---|---|---|---|---|---|
| Auto-increment | 32/64 | Yes | Required | — | Yes |
| UUID v4 | 128 | No | None | 36 | No |
| UUID v7 | 128 | Yes | None | 36 | No |
| Snowflake | 64 | Yes | Worker assignment | — | Yes |
| ULID | 128 | Yes | None | 26 | No |
Multi-Datacenter ID Generation#
In a global system, IDs must be unique across datacenters without cross-region coordination (which adds latency).
Snowflake approach: Embed the datacenter ID in the ID itself. Each datacenter generates independently; the bit layout guarantees uniqueness.
ULID/UUID v7 approach: Rely on sufficient randomness. With 62-80 random bits per millisecond, collision probability across datacenters is negligible.
Ticket server approach (Flickr): Dedicated ID-generating databases in each region, each assigned a non-overlapping range. Simple but introduces a single point of failure per region.
Clock Skew Handling#
Distributed ID generators that embed timestamps are vulnerable to clock skew — when a node's clock jumps backward due to NTP adjustment.
Mitigation Strategies#
-
Reject backward jumps — If the current timestamp is less than the last seen timestamp, wait or throw an error. Snowflake implementations commonly do this.
-
Logical clock fallback — Track the last timestamp used. If the clock goes backward, keep using the last timestamp and increment the sequence number.
-
NTP discipline — Use
chronyor a similar daemon configured for slew-only adjustments (no jumps). AWS and GCP time services provide leap-smeared, monotonic clocks. -
Bounded skew tolerance — Allow small backward jumps (e.g., < 5ms) and absorb them in the sequence space. Reject larger jumps.
if current_ts < last_ts:
if last_ts - current_ts < MAX_SKEW_MS:
current_ts = last_ts # absorb small skew
sequence += 1
else:
raise ClockSkewError("clock moved backward by too much")
ID Encoding: Base62#
Raw 64-bit or 128-bit IDs are often encoded for use in URLs, short links, and APIs. Base62 (a-z, A-Z, 0-9) is the most common encoding:
Alphabet: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
64-bit integer → up to 11 base62 characters
128-bit integer → up to 22 base62 characters
Why base62 over base64? Base62 avoids + and /, which are problematic in URLs without percent-encoding. It is case-sensitive but URL-safe without escaping.
Encoding Algorithm#
def to_base62(num: int) -> str:
chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
if num == 0:
return "0"
result = []
while num > 0:
result.append(chars[num % 62])
num //= 62
return "".join(reversed(result))
Design Interview Walkthrough#
When asked to design a unique ID generator, structure your answer around these decisions:
- Clarify requirements — Uniqueness only? Time ordering? Compactness? Throughput target?
- Choose the scheme — UUID v7 for simplicity, Snowflake for compactness and ordering, ULID for string-friendly ordering.
- Handle multi-datacenter — Embed datacenter bits (Snowflake) or rely on randomness (ULID/UUID v7).
- Address clock skew — Logical clock fallback + NTP discipline.
- Define the encoding — Base62 for URLs, raw BIGINT for database storage.
- Estimate throughput — Snowflake: 4096/ms/worker. UUID v7: limited only by random number generation speed.
- Plan for failure — What happens if ZooKeeper is down (Snowflake)? What if the clock daemon fails?
Practical Recommendations#
- Greenfield project, no special constraints: UUID v7. Supported by modern databases, no coordination needed, time-ordered.
- High-throughput, compact IDs: Snowflake. Fits in BIGINT, 4M IDs/sec per worker, embeds useful metadata.
- User-facing short IDs: Snowflake + base62 encoding. An 11-character string like
3kTMd7yoG9is compact and URL-safe. - Already using UUIDs everywhere: Migrate from v4 to v7 for ordering benefits with no schema change.
Build, visualize, and practice system design at codelit.io.
This is article #194 in the Codelit system design series.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
Try these templates
Build this architecture
Generate an interactive architecture for Unique ID Generator in seconds.
Try it in Codelit →
Comments