unique ID generatorSnowflake IDUUIDULIDdistributed systemssystem designID generation

Unique ID Generator: Designing IDs for Distributed Systems

March 28, 2026 7 min readBy Codelit Team Discussion

Every distributed system needs a way to identify records uniquely across machines, datacenters, and time zones. A naive auto-incrementing integer breaks the moment you add a second database node. Designing a robust unique ID generator is a classic system design problem — and the decisions you make affect ordering, indexing, latency, and debuggability.

Requirements#

Before choosing an approach, clarify these requirements:

Global uniqueness — No two IDs collide, even across datacenters.
Rough time ordering — IDs generated later should sort after IDs generated earlier (important for database indexing and pagination).
Low latency — ID generation must not become a bottleneck; sub-millisecond is ideal.
High availability — No single point of failure.
Compactness — Shorter IDs reduce storage and network overhead.

Not every system needs all five. A logging pipeline may care only about uniqueness, while a social feed needs strict time ordering.

Database Auto-Increment: Why It Breaks#

The simplest approach — AUTO_INCREMENT in MySQL or SERIAL in PostgreSQL — works on a single node:

INSERT INTO orders (id, ...) VALUES (DEFAULT, ...);
-- id = 1, 2, 3, 4, ...

Problems in a distributed setting:

Single point of failure — One database generates all IDs.
Coordination overhead — Multi-master setups (odd/even IDs) waste half the ID space and break ordering.
Latency — Every ID requires a round-trip to the database.
Scaling ceiling — The ID-generating node becomes a bottleneck under high write load.

Auto-increment is fine for small, single-region applications. Beyond that, you need a distributed scheme.

UUID v4: Random but Unordered#

A UUID v4 is 128 bits of randomness formatted as 550e8400-e29b-41d4-a716-446655440000. Any node can generate one without coordination.

Pros:

Zero coordination — generate locally, instantly.
Collision probability is astronomically low (2^122 random bits).

Cons:

No time ordering — UUIDs are random, so B-tree indexes scatter inserts across pages, causing write amplification.
36 characters as a string — bulky in URLs and logs.
Not human-readable or debuggable.

UUID v4 is a safe default when ordering does not matter and you can tolerate the storage overhead.

UUID v7: Time-Ordered UUIDs#

RFC 9562 (2024) introduced UUID v7, which embeds a Unix timestamp in the most significant bits:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
├─────────────────── unix_ts_ms (48 bits) ──────────────────────────┤
├─ ver ─┤──── rand_a (12 bits) ────┤─var─┤──── rand_b (62 bits) ───┤

IDs sort chronologically because the timestamp occupies the high-order bits. This gives you B-tree-friendly inserts while retaining the no-coordination property of UUIDs.

Trade-off: Still 128 bits (36 characters as a string). If compactness matters, consider ULID or Snowflake.

Twitter Snowflake ID#

Twitter's Snowflake (2010) is the most widely referenced distributed ID scheme in system design interviews. A Snowflake ID is a 64-bit integer:

┌──────────────────────────────────────────────────────────────────┐
│  0  │         41 bits: timestamp (ms)         │ 5 │ 5 │  12    │
│sign │   (milliseconds since custom epoch)     │ DC│WK │  seq   │
└──────────────────────────────────────────────────────────────────┘

- 1 bit:  sign (always 0)
- 41 bits: millisecond timestamp → ~69 years from epoch
- 5 bits:  datacenter ID → 32 datacenters
- 5 bits:  worker ID → 32 workers per datacenter
- 12 bits: sequence number → 4096 IDs per millisecond per worker

Properties#

Time-ordered — The timestamp in the high bits means IDs sort chronologically.
Compact — 64 bits fits in a database BIGINT and is half the size of a UUID.
High throughput — Each worker generates up to 4096 IDs per millisecond (4M per second) with no coordination.
Embedded metadata — You can extract the timestamp, datacenter, and worker from the ID itself.

Limitations#

Requires pre-assigned datacenter and worker IDs (typically via ZooKeeper or configuration).
41-bit timestamp overflows after ~69 years — choose the epoch carefully.
Clock skew can produce duplicate or out-of-order IDs (see below).

ULID: Lexicographically Sortable#

A ULID (Universally Unique Lexicographically Sortable Identifier) is 128 bits encoded as a 26-character Crockford Base32 string:

 01ARZ3NDEKTSV4RRFFQ69G5FAV
 └──────┘└────────────────┘
 timestamp    randomness
 (48 bits)   (80 bits)

Advantages over UUID v7:

Shorter string representation (26 vs. 36 characters).
Case-insensitive, no hyphens — URL and filename friendly.
Monotonic sort order within the same millisecond (implementations increment the random component).

Disadvantage: Not a standard UUID, so some databases and libraries expect UUID format.

Comparison Table#

Scheme	Bits	Ordered	Coordination	String Length	Fits BIGINT
Auto-increment	32/64	Yes	Required	—	Yes
UUID v4	128	No	None	36	No
UUID v7	128	Yes	None	36	No
Snowflake	64	Yes	Worker assignment	—	Yes
ULID	128	Yes	None	26	No

Multi-Datacenter ID Generation#

In a global system, IDs must be unique across datacenters without cross-region coordination (which adds latency).

Snowflake approach: Embed the datacenter ID in the ID itself. Each datacenter generates independently; the bit layout guarantees uniqueness.

ULID/UUID v7 approach: Rely on sufficient randomness. With 62-80 random bits per millisecond, collision probability across datacenters is negligible.

Ticket server approach (Flickr): Dedicated ID-generating databases in each region, each assigned a non-overlapping range. Simple but introduces a single point of failure per region.

Clock Skew Handling#

Distributed ID generators that embed timestamps are vulnerable to clock skew — when a node's clock jumps backward due to NTP adjustment.

Mitigation Strategies#

Reject backward jumps — If the current timestamp is less than the last seen timestamp, wait or throw an error. Snowflake implementations commonly do this.
Logical clock fallback — Track the last timestamp used. If the clock goes backward, keep using the last timestamp and increment the sequence number.
NTP discipline — Use chrony or a similar daemon configured for slew-only adjustments (no jumps). AWS and GCP time services provide leap-smeared, monotonic clocks.
Bounded skew tolerance — Allow small backward jumps (e.g., < 5ms) and absorb them in the sequence space. Reject larger jumps.

if current_ts < last_ts:
    if last_ts - current_ts < MAX_SKEW_MS:
        current_ts = last_ts  # absorb small skew
        sequence += 1
    else:
        raise ClockSkewError("clock moved backward by too much")

ID Encoding: Base62#

Raw 64-bit or 128-bit IDs are often encoded for use in URLs, short links, and APIs. Base62 (a-z, A-Z, 0-9) is the most common encoding:

Alphabet: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

64-bit integer  → up to 11 base62 characters
128-bit integer → up to 22 base62 characters

Why base62 over base64? Base62 avoids + and /, which are problematic in URLs without percent-encoding. It is case-sensitive but URL-safe without escaping.

Encoding Algorithm#

def to_base62(num: int) -> str:
    chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
    if num == 0:
        return "0"
    result = []
    while num > 0:
        result.append(chars[num % 62])
        num //= 62
    return "".join(reversed(result))

Design Interview Walkthrough#

When asked to design a unique ID generator, structure your answer around these decisions:

Clarify requirements — Uniqueness only? Time ordering? Compactness? Throughput target?
Choose the scheme — UUID v7 for simplicity, Snowflake for compactness and ordering, ULID for string-friendly ordering.
Handle multi-datacenter — Embed datacenter bits (Snowflake) or rely on randomness (ULID/UUID v7).
Address clock skew — Logical clock fallback + NTP discipline.
Define the encoding — Base62 for URLs, raw BIGINT for database storage.
Estimate throughput — Snowflake: 4096/ms/worker. UUID v7: limited only by random number generation speed.
Plan for failure — What happens if ZooKeeper is down (Snowflake)? What if the clock daemon fails?

Practical Recommendations#

Greenfield project, no special constraints: UUID v7. Supported by modern databases, no coordination needed, time-ordered.
High-throughput, compact IDs: Snowflake. Fits in BIGINT, 4M IDs/sec per worker, embeds useful metadata.
User-facing short IDs: Snowflake + base62 encoding. An 11-character string like 3kTMd7yoG9 is compact and URL-safe.
Already using UUIDs everywhere: Migrate from v4 to v7 for ordering benefits with no schema change.

Build, visualize, and practice system design at codelit.io.

This is article #194 in the Codelit system design series.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

Distributed Key-Value Store

Redis/DynamoDB-like distributed KV store with consistent hashing, replication, and tunable consistency.

8 components

Build this architecture

Generate an interactive architecture for Unique ID Generator in seconds.

Try it in Codelit →

unique ID generatorSnowflake IDUUIDULIDdistributed systemssystem designID generation

Unique ID Generator: Designing IDs for Distributed Systems

March 28, 2026 7 min readBy Codelit Team Discussion

Requirements#

Before choosing an approach, clarify these requirements:

Global uniqueness — No two IDs collide, even across datacenters.
Rough time ordering — IDs generated later should sort after IDs generated earlier (important for database indexing and pagination).
Low latency — ID generation must not become a bottleneck; sub-millisecond is ideal.
High availability — No single point of failure.
Compactness — Shorter IDs reduce storage and network overhead.

Not every system needs all five. A logging pipeline may care only about uniqueness, while a social feed needs strict time ordering.

Database Auto-Increment: Why It Breaks#

The simplest approach — AUTO_INCREMENT in MySQL or SERIAL in PostgreSQL — works on a single node:

INSERT INTO orders (id, ...) VALUES (DEFAULT, ...);
-- id = 1, 2, 3, 4, ...

Problems in a distributed setting:

Single point of failure — One database generates all IDs.
Coordination overhead — Multi-master setups (odd/even IDs) waste half the ID space and break ordering.
Latency — Every ID requires a round-trip to the database.
Scaling ceiling — The ID-generating node becomes a bottleneck under high write load.

Auto-increment is fine for small, single-region applications. Beyond that, you need a distributed scheme.

UUID v4: Random but Unordered#

A UUID v4 is 128 bits of randomness formatted as 550e8400-e29b-41d4-a716-446655440000. Any node can generate one without coordination.

Pros:

Zero coordination — generate locally, instantly.
Collision probability is astronomically low (2^122 random bits).

Cons:

No time ordering — UUIDs are random, so B-tree indexes scatter inserts across pages, causing write amplification.
36 characters as a string — bulky in URLs and logs.
Not human-readable or debuggable.

UUID v4 is a safe default when ordering does not matter and you can tolerate the storage overhead.

UUID v7: Time-Ordered UUIDs#

RFC 9562 (2024) introduced UUID v7, which embeds a Unix timestamp in the most significant bits:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
├─────────────────── unix_ts_ms (48 bits) ──────────────────────────┤
├─ ver ─┤──── rand_a (12 bits) ────┤─var─┤──── rand_b (62 bits) ───┤

IDs sort chronologically because the timestamp occupies the high-order bits. This gives you B-tree-friendly inserts while retaining the no-coordination property of UUIDs.

Trade-off: Still 128 bits (36 characters as a string). If compactness matters, consider ULID or Snowflake.

Twitter Snowflake ID#

Twitter's Snowflake (2010) is the most widely referenced distributed ID scheme in system design interviews. A Snowflake ID is a 64-bit integer:

┌──────────────────────────────────────────────────────────────────┐
│  0  │         41 bits: timestamp (ms)         │ 5 │ 5 │  12    │
│sign │   (milliseconds since custom epoch)     │ DC│WK │  seq   │
└──────────────────────────────────────────────────────────────────┘

- 1 bit:  sign (always 0)
- 41 bits: millisecond timestamp → ~69 years from epoch
- 5 bits:  datacenter ID → 32 datacenters
- 5 bits:  worker ID → 32 workers per datacenter
- 12 bits: sequence number → 4096 IDs per millisecond per worker

Properties#

Time-ordered — The timestamp in the high bits means IDs sort chronologically.
Compact — 64 bits fits in a database BIGINT and is half the size of a UUID.
High throughput — Each worker generates up to 4096 IDs per millisecond (4M per second) with no coordination.
Embedded metadata — You can extract the timestamp, datacenter, and worker from the ID itself.

Limitations#

Requires pre-assigned datacenter and worker IDs (typically via ZooKeeper or configuration).
41-bit timestamp overflows after ~69 years — choose the epoch carefully.
Clock skew can produce duplicate or out-of-order IDs (see below).

ULID: Lexicographically Sortable#

A ULID (Universally Unique Lexicographically Sortable Identifier) is 128 bits encoded as a 26-character Crockford Base32 string:

 01ARZ3NDEKTSV4RRFFQ69G5FAV
 └──────┘└────────────────┘
 timestamp    randomness
 (48 bits)   (80 bits)

Advantages over UUID v7:

Shorter string representation (26 vs. 36 characters).
Case-insensitive, no hyphens — URL and filename friendly.
Monotonic sort order within the same millisecond (implementations increment the random component).

Disadvantage: Not a standard UUID, so some databases and libraries expect UUID format.

Comparison Table#

Scheme	Bits	Ordered	Coordination	String Length	Fits BIGINT
Auto-increment	32/64	Yes	Required	—	Yes
UUID v4	128	No	None	36	No
UUID v7	128	Yes	None	36	No
Snowflake	64	Yes	Worker assignment	—	Yes
ULID	128	Yes	None	26	No

Multi-Datacenter ID Generation#

In a global system, IDs must be unique across datacenters without cross-region coordination (which adds latency).

Snowflake approach: Embed the datacenter ID in the ID itself. Each datacenter generates independently; the bit layout guarantees uniqueness.

ULID/UUID v7 approach: Rely on sufficient randomness. With 62-80 random bits per millisecond, collision probability across datacenters is negligible.

Ticket server approach (Flickr): Dedicated ID-generating databases in each region, each assigned a non-overlapping range. Simple but introduces a single point of failure per region.

Clock Skew Handling#

Distributed ID generators that embed timestamps are vulnerable to clock skew — when a node's clock jumps backward due to NTP adjustment.

Mitigation Strategies#

Reject backward jumps — If the current timestamp is less than the last seen timestamp, wait or throw an error. Snowflake implementations commonly do this.
Logical clock fallback — Track the last timestamp used. If the clock goes backward, keep using the last timestamp and increment the sequence number.
NTP discipline — Use chrony or a similar daemon configured for slew-only adjustments (no jumps). AWS and GCP time services provide leap-smeared, monotonic clocks.
Bounded skew tolerance — Allow small backward jumps (e.g., < 5ms) and absorb them in the sequence space. Reject larger jumps.

if current_ts < last_ts:
    if last_ts - current_ts < MAX_SKEW_MS:
        current_ts = last_ts  # absorb small skew
        sequence += 1
    else:
        raise ClockSkewError("clock moved backward by too much")

ID Encoding: Base62#

Raw 64-bit or 128-bit IDs are often encoded for use in URLs, short links, and APIs. Base62 (a-z, A-Z, 0-9) is the most common encoding:

Alphabet: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

64-bit integer  → up to 11 base62 characters
128-bit integer → up to 22 base62 characters

Why base62 over base64? Base62 avoids + and /, which are problematic in URLs without percent-encoding. It is case-sensitive but URL-safe without escaping.

Encoding Algorithm#

def to_base62(num: int) -> str:
    chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
    if num == 0:
        return "0"
    result = []
    while num > 0:
        result.append(chars[num % 62])
        num //= 62
    return "".join(reversed(result))

Design Interview Walkthrough#

When asked to design a unique ID generator, structure your answer around these decisions:

Clarify requirements — Uniqueness only? Time ordering? Compactness? Throughput target?
Choose the scheme — UUID v7 for simplicity, Snowflake for compactness and ordering, ULID for string-friendly ordering.
Handle multi-datacenter — Embed datacenter bits (Snowflake) or rely on randomness (ULID/UUID v7).
Address clock skew — Logical clock fallback + NTP discipline.
Define the encoding — Base62 for URLs, raw BIGINT for database storage.
Estimate throughput — Snowflake: 4096/ms/worker. UUID v7: limited only by random number generation speed.
Plan for failure — What happens if ZooKeeper is down (Snowflake)? What if the clock daemon fails?

Practical Recommendations#

Greenfield project, no special constraints: UUID v7. Supported by modern databases, no coordination needed, time-ordered.
High-throughput, compact IDs: Snowflake. Fits in BIGINT, 4M IDs/sec per worker, embeds useful metadata.
User-facing short IDs: Snowflake + base62 encoding. An 11-character string like 3kTMd7yoG9 is compact and URL-safe.
Already using UUIDs everywhere: Migrate from v4 to v7 for ordering benefits with no schema change.

Build, visualize, and practice system design at codelit.io.

This is article #194 in the Codelit system design series.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

Try these templates

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

Distributed Key-Value Store

Redis/DynamoDB-like distributed KV store with consistent hashing, replication, and tunable consistency.

8 components

Build this architecture

Generate an interactive architecture for Unique ID Generator in seconds.

Try it in Codelit →

Unique ID Generator: Designing IDs for Distributed Systems

Requirements#

Database Auto-Increment: Why It Breaks#

UUID v4: Random but Unordered#

UUID v7: Time-Ordered UUIDs#

Twitter Snowflake ID#

Properties#

Limitations#

ULID: Lexicographically Sortable#

Comparison Table#

Multi-Datacenter ID Generation#

Clock Skew Handling#

Mitigation Strategies#

ID Encoding: Base62#

Encoding Algorithm#

Design Interview Walkthrough#

Practical Recommendations#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Distributed Rate Limiter

Distributed Key-Value Store

Build this architecture

Unique ID Generator: Designing IDs for Distributed Systems

Requirements#

Database Auto-Increment: Why It Breaks#

UUID v4: Random but Unordered#

UUID v7: Time-Ordered UUIDs#

Twitter Snowflake ID#

Properties#

Limitations#

ULID: Lexicographically Sortable#

Comparison Table#

Multi-Datacenter ID Generation#

Clock Skew Handling#

Mitigation Strategies#

ID Encoding: Base62#

Encoding Algorithm#

Design Interview Walkthrough#

Practical Recommendations#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Distributed Rate Limiter

Distributed Key-Value Store

Build this architecture