NoSQLdata modelingMongoDBDynamoDBdatabasessystem design

NoSQL Data Modeling: Patterns for Document, Key-Value, Wide-Column & Graph Databases

March 28, 2026 6 min readBy Codelit Team Discussion

NoSQL Data Modeling#

Relational databases optimize for storage efficiency. NoSQL databases optimize for query efficiency. This fundamental difference changes everything about how you model data.

Relational vs NoSQL Mindset#

In relational modeling, you normalize first and worry about queries later:

Users table → Orders table → OrderItems table → Products table
4 JOINs to answer "What did user X buy?"

In NoSQL modeling, you start with the queries:

Access pattern: "Get all orders for user X with product details"
→ Design a single document or item that answers this in one read

The golden rule: model for access patterns, not for entities.

Document Databases (MongoDB)#

Documents store nested, hierarchical data as JSON-like objects. The key decision is embedding vs referencing.

Embed When#

Data is read together frequently
Child data belongs to exactly one parent
The embedded array won't grow unbounded

// Order document — embed line items
{
  "_id": "order_123",
  "userId": "user_456",
  "status": "shipped",
  "items": [
    { "productId": "prod_A", "name": "Keyboard", "price": 89, "qty": 1 },
    { "productId": "prod_B", "name": "Mouse", "price": 49, "qty": 2 }
  ],
  "shipping": {
    "address": "123 Main St",
    "carrier": "FedEx",
    "tracking": "FX123456"
  }
}

Reference When#

Data is shared across documents (many-to-many)
Embedded arrays would grow without bound
You need independent updates

// Blog post — reference author (shared across posts)
{
  "_id": "post_789",
  "title": "NoSQL Patterns",
  "authorId": "author_42",
  "tagIds": ["tag_nosql", "tag_databases"]
}

The Subset Pattern#

Store frequently accessed fields inline and full data separately:

// Product listing (fast reads)
{ "_id": "prod_A", "name": "Keyboard", "price": 89, "thumbnail": "kb.jpg" }

// Product detail collection (full data)
{ "productId": "prod_A", "specs": { ... }, "reviews": [ ... ], "warranty": { ... } }

Key-Value Stores (Redis, DynamoDB)#

Key-value stores are the simplest and fastest NoSQL model. Everything revolves around key design.

Redis Patterns#

// Session store
SET session:abc123 '{"userId":"u1","role":"admin"}' EX 3600

// Leaderboard
ZADD leaderboard 9500 "player_1"
ZADD leaderboard 8700 "player_2"
ZREVRANGE leaderboard 0 9    // top 10

// Rate limiting
INCR ratelimit:user42:minute
EXPIRE ratelimit:user42:minute 60

DynamoDB Single-Table Design#

DynamoDB charges per table and per read/write. Single-table design puts all entities in one table using generic partition and sort keys:

PK                  | SK                  | Data
--------------------|---------------------|------------------
USER#u1             | PROFILE             | {name, email}
USER#u1             | ORDER#o1            | {status, total}
USER#u1             | ORDER#o2            | {status, total}
ORDER#o1            | ITEM#i1             | {product, qty}
ORDER#o1            | ITEM#i2             | {product, qty}
PRODUCT#p1          | METADATA            | {name, price}

Query "all orders for user u1":

PK = "USER#u1" AND begins_with(SK, "ORDER#")

Query "all items in order o1":

PK = "ORDER#o1" AND begins_with(SK, "ITEM#")

One table, zero joins, predictable latency at any scale.

GSI Overloading#

Use generic Global Secondary Indexes for alternate access patterns:

GSI1-PK             | GSI1-SK             | Use Case
--------------------|---------------------|------------------
STATUS#shipped      | 2026-03-28#o1       | Orders by status + date
EMAIL#mo@co.io      | USER                | User lookup by email

Wide-Column Stores (Cassandra)#

Cassandra excels at high write throughput and time-series data. You design tables around queries — one table per query pattern.

-- Query: "Get all sensor readings for device X in the last hour"
CREATE TABLE sensor_readings (
    device_id   TEXT,
    reading_ts  TIMESTAMP,
    temperature DOUBLE,
    humidity    DOUBLE,
    PRIMARY KEY (device_id, reading_ts)
) WITH CLUSTERING ORDER BY (reading_ts DESC);

SELECT * FROM sensor_readings
WHERE device_id = 'sensor_42'
AND reading_ts > '2026-03-28T10:00:00Z';

Key rules:

Partition key = how data is distributed (device_id)
Clustering key = how data is sorted within a partition (reading_ts)
No joins, no subqueries, no ad-hoc aggregations
Duplicate data across tables to serve different queries

Graph Databases (Neo4j)#

When relationships are the data, graphs win. Social networks, recommendation engines, fraud detection, knowledge graphs.

// Create nodes and relationships
CREATE (alice:Person {name: "Alice"})
CREATE (bob:Person {name: "Bob"})
CREATE (neo4j:Technology {name: "Neo4j"})
CREATE (alice)-[:KNOWS]->(bob)
CREATE (alice)-[:USES]->(neo4j)
CREATE (bob)-[:USES]->(neo4j)

// "Friends of friends who use Neo4j"
MATCH (alice:Person {name: "Alice"})-[:KNOWS*2]-(fof)-[:USES]->(t:Technology {name: "Neo4j"})
RETURN DISTINCT fof.name

Graph queries that would require recursive CTEs or multiple joins in SQL become single, readable traversals.

Denormalization Patterns#

NoSQL databases embrace denormalization. Common patterns:

Pattern	Description	Use Case
Embedding	Nest related data inside a document	Orders with line items
Duplication	Copy fields across documents	Product name in order items
Pre-aggregation	Store computed totals	Order count on user profile
Materialized views	Maintain query-optimized copies	Leaderboards, dashboards
Bucketing	Group time-series into chunks	Hourly sensor reading buckets

The trade-off is always the same: faster reads, more complex writes.

When to Use Which#

Database Type	Best For	Avoid When
Document (MongoDB)	Flexible schemas, content management, catalogs	Heavy cross-document transactions
Key-Value (Redis)	Caching, sessions, real-time leaderboards	Complex queries, relationships
Key-Value (DynamoDB)	Serverless, predictable scale, single-table	Ad-hoc analytics, unknown access patterns
Wide-Column (Cassandra)	Time-series, IoT, high write throughput	Low-latency reads on arbitrary columns
Graph (Neo4j)	Relationship-heavy queries, recommendations	Simple CRUD, tabular data

Access-Pattern-Driven Design Checklist#

List every access pattern before designing anything
Identify the primary key that serves each pattern
Decide embed vs reference for related data
Plan for secondary indexes (GSIs, materialized views)
Estimate item/document sizes to stay within limits
Prototype with real queries — if a pattern requires a scan, redesign

Key Takeaways#

Start with access patterns, not entities
Denormalization is not a sin — it is the strategy
Single-table design in DynamoDB eliminates joins and reduces cost
Pick the database type that matches your dominant query shape
Every NoSQL decision trades write complexity for read performance

Build and visualize your data models with codelit.io — the all-in-one workspace for engineering teams.

Article 161 on the Codelit engineering blog.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

Data Warehouse & Analytics

Snowflake-like data warehouse with ELT pipelines, SQL analytics, dashboards, and data governance.

8 components

MongoDB Sharded Cluster

Horizontally scaled MongoDB with sharding, replica sets, config servers, and mongos query routing.

10 components

GraphQL API Gateway

Federated GraphQL gateway aggregating multiple microservice schemas with caching, auth, and rate limiting.

10 components

Build this architecture

Generate an interactive architecture for NoSQL Data Modeling in seconds.

Try it in Codelit →

NoSQLdata modelingMongoDBDynamoDBdatabasessystem design

NoSQL Data Modeling: Patterns for Document, Key-Value, Wide-Column & Graph Databases

March 28, 2026 6 min readBy Codelit Team Discussion

NoSQL Data Modeling#

Relational databases optimize for storage efficiency. NoSQL databases optimize for query efficiency. This fundamental difference changes everything about how you model data.

Relational vs NoSQL Mindset#

In relational modeling, you normalize first and worry about queries later:

Users table → Orders table → OrderItems table → Products table
4 JOINs to answer "What did user X buy?"

In NoSQL modeling, you start with the queries:

Access pattern: "Get all orders for user X with product details"
→ Design a single document or item that answers this in one read

The golden rule: model for access patterns, not for entities.

Document Databases (MongoDB)#

Documents store nested, hierarchical data as JSON-like objects. The key decision is embedding vs referencing.

Embed When#

Data is read together frequently
Child data belongs to exactly one parent
The embedded array won't grow unbounded

// Order document — embed line items
{
  "_id": "order_123",
  "userId": "user_456",
  "status": "shipped",
  "items": [
    { "productId": "prod_A", "name": "Keyboard", "price": 89, "qty": 1 },
    { "productId": "prod_B", "name": "Mouse", "price": 49, "qty": 2 }
  ],
  "shipping": {
    "address": "123 Main St",
    "carrier": "FedEx",
    "tracking": "FX123456"
  }
}

Reference When#

Data is shared across documents (many-to-many)
Embedded arrays would grow without bound
You need independent updates

// Blog post — reference author (shared across posts)
{
  "_id": "post_789",
  "title": "NoSQL Patterns",
  "authorId": "author_42",
  "tagIds": ["tag_nosql", "tag_databases"]
}

The Subset Pattern#

Store frequently accessed fields inline and full data separately:

// Product listing (fast reads)
{ "_id": "prod_A", "name": "Keyboard", "price": 89, "thumbnail": "kb.jpg" }

// Product detail collection (full data)
{ "productId": "prod_A", "specs": { ... }, "reviews": [ ... ], "warranty": { ... } }

Key-Value Stores (Redis, DynamoDB)#

Key-value stores are the simplest and fastest NoSQL model. Everything revolves around key design.

Redis Patterns#

// Session store
SET session:abc123 '{"userId":"u1","role":"admin"}' EX 3600

// Leaderboard
ZADD leaderboard 9500 "player_1"
ZADD leaderboard 8700 "player_2"
ZREVRANGE leaderboard 0 9    // top 10

// Rate limiting
INCR ratelimit:user42:minute
EXPIRE ratelimit:user42:minute 60

DynamoDB Single-Table Design#

DynamoDB charges per table and per read/write. Single-table design puts all entities in one table using generic partition and sort keys:

PK                  | SK                  | Data
--------------------|---------------------|------------------
USER#u1             | PROFILE             | {name, email}
USER#u1             | ORDER#o1            | {status, total}
USER#u1             | ORDER#o2            | {status, total}
ORDER#o1            | ITEM#i1             | {product, qty}
ORDER#o1            | ITEM#i2             | {product, qty}
PRODUCT#p1          | METADATA            | {name, price}

Query "all orders for user u1":

PK = "USER#u1" AND begins_with(SK, "ORDER#")

Query "all items in order o1":

PK = "ORDER#o1" AND begins_with(SK, "ITEM#")

One table, zero joins, predictable latency at any scale.

GSI Overloading#

Use generic Global Secondary Indexes for alternate access patterns:

GSI1-PK             | GSI1-SK             | Use Case
--------------------|---------------------|------------------
STATUS#shipped      | 2026-03-28#o1       | Orders by status + date
EMAIL#mo@co.io      | USER                | User lookup by email

Wide-Column Stores (Cassandra)#

Cassandra excels at high write throughput and time-series data. You design tables around queries — one table per query pattern.

-- Query: "Get all sensor readings for device X in the last hour"
CREATE TABLE sensor_readings (
    device_id   TEXT,
    reading_ts  TIMESTAMP,
    temperature DOUBLE,
    humidity    DOUBLE,
    PRIMARY KEY (device_id, reading_ts)
) WITH CLUSTERING ORDER BY (reading_ts DESC);

SELECT * FROM sensor_readings
WHERE device_id = 'sensor_42'
AND reading_ts > '2026-03-28T10:00:00Z';

Key rules:

Partition key = how data is distributed (device_id)
Clustering key = how data is sorted within a partition (reading_ts)
No joins, no subqueries, no ad-hoc aggregations
Duplicate data across tables to serve different queries

Graph Databases (Neo4j)#

When relationships are the data, graphs win. Social networks, recommendation engines, fraud detection, knowledge graphs.

// Create nodes and relationships
CREATE (alice:Person {name: "Alice"})
CREATE (bob:Person {name: "Bob"})
CREATE (neo4j:Technology {name: "Neo4j"})
CREATE (alice)-[:KNOWS]->(bob)
CREATE (alice)-[:USES]->(neo4j)
CREATE (bob)-[:USES]->(neo4j)

// "Friends of friends who use Neo4j"
MATCH (alice:Person {name: "Alice"})-[:KNOWS*2]-(fof)-[:USES]->(t:Technology {name: "Neo4j"})
RETURN DISTINCT fof.name

Graph queries that would require recursive CTEs or multiple joins in SQL become single, readable traversals.

Denormalization Patterns#

NoSQL databases embrace denormalization. Common patterns:

Pattern	Description	Use Case
Embedding	Nest related data inside a document	Orders with line items
Duplication	Copy fields across documents	Product name in order items
Pre-aggregation	Store computed totals	Order count on user profile
Materialized views	Maintain query-optimized copies	Leaderboards, dashboards
Bucketing	Group time-series into chunks	Hourly sensor reading buckets

The trade-off is always the same: faster reads, more complex writes.

When to Use Which#

Database Type	Best For	Avoid When
Document (MongoDB)	Flexible schemas, content management, catalogs	Heavy cross-document transactions
Key-Value (Redis)	Caching, sessions, real-time leaderboards	Complex queries, relationships
Key-Value (DynamoDB)	Serverless, predictable scale, single-table	Ad-hoc analytics, unknown access patterns
Wide-Column (Cassandra)	Time-series, IoT, high write throughput	Low-latency reads on arbitrary columns
Graph (Neo4j)	Relationship-heavy queries, recommendations	Simple CRUD, tabular data

Access-Pattern-Driven Design Checklist#

List every access pattern before designing anything
Identify the primary key that serves each pattern
Decide embed vs reference for related data
Plan for secondary indexes (GSIs, materialized views)
Estimate item/document sizes to stay within limits
Prototype with real queries — if a pattern requires a scan, redesign

Key Takeaways#

Start with access patterns, not entities
Denormalization is not a sin — it is the strategy
Single-table design in DynamoDB eliminates joins and reduces cost
Pick the database type that matches your dominant query shape
Every NoSQL decision trades write complexity for read performance

Build and visualize your data models with codelit.io — the all-in-one workspace for engineering teams.

Article 161 on the Codelit engineering blog.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

Build this architecture

Generate an interactive architecture for NoSQL Data Modeling in seconds.

Try it in Codelit →

NoSQL Data Modeling: Patterns for Document, Key-Value, Wide-Column & Graph Databases

NoSQL Data Modeling#

Relational vs NoSQL Mindset#

Document Databases (MongoDB)#

Embed When#

Reference When#

The Subset Pattern#

Key-Value Stores (Redis, DynamoDB)#

Redis Patterns#

DynamoDB Single-Table Design#

GSI Overloading#

Wide-Column Stores (Cassandra)#

Graph Databases (Neo4j)#

Denormalization Patterns#

When to Use Which#

Access-Pattern-Driven Design Checklist#

Key Takeaways#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Data Warehouse & Analytics

MongoDB Sharded Cluster

GraphQL API Gateway

Build this architecture

NoSQL Data Modeling: Patterns for Document, Key-Value, Wide-Column & Graph Databases

NoSQL Data Modeling#

Relational vs NoSQL Mindset#

Document Databases (MongoDB)#

Embed When#

Reference When#

The Subset Pattern#

Key-Value Stores (Redis, DynamoDB)#

Redis Patterns#

DynamoDB Single-Table Design#

GSI Overloading#

Wide-Column Stores (Cassandra)#

Graph Databases (Neo4j)#

Denormalization Patterns#

When to Use Which#

Access-Pattern-Driven Design Checklist#

Key Takeaways#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Data Warehouse & Analytics

MongoDB Sharded Cluster

GraphQL API Gateway

Build this architecture