NoSQL Data Modeling: Patterns for Document, Key-Value, Wide-Column & Graph Databases
NoSQL Data Modeling#
Relational databases optimize for storage efficiency. NoSQL databases optimize for query efficiency. This fundamental difference changes everything about how you model data.
Relational vs NoSQL Mindset#
In relational modeling, you normalize first and worry about queries later:
Users table → Orders table → OrderItems table → Products table
4 JOINs to answer "What did user X buy?"
In NoSQL modeling, you start with the queries:
Access pattern: "Get all orders for user X with product details"
→ Design a single document or item that answers this in one read
The golden rule: model for access patterns, not for entities.
Document Databases (MongoDB)#
Documents store nested, hierarchical data as JSON-like objects. The key decision is embedding vs referencing.
Embed When#
- Data is read together frequently
- Child data belongs to exactly one parent
- The embedded array won't grow unbounded
// Order document — embed line items
{
"_id": "order_123",
"userId": "user_456",
"status": "shipped",
"items": [
{ "productId": "prod_A", "name": "Keyboard", "price": 89, "qty": 1 },
{ "productId": "prod_B", "name": "Mouse", "price": 49, "qty": 2 }
],
"shipping": {
"address": "123 Main St",
"carrier": "FedEx",
"tracking": "FX123456"
}
}
Reference When#
- Data is shared across documents (many-to-many)
- Embedded arrays would grow without bound
- You need independent updates
// Blog post — reference author (shared across posts)
{
"_id": "post_789",
"title": "NoSQL Patterns",
"authorId": "author_42",
"tagIds": ["tag_nosql", "tag_databases"]
}
The Subset Pattern#
Store frequently accessed fields inline and full data separately:
// Product listing (fast reads)
{ "_id": "prod_A", "name": "Keyboard", "price": 89, "thumbnail": "kb.jpg" }
// Product detail collection (full data)
{ "productId": "prod_A", "specs": { ... }, "reviews": [ ... ], "warranty": { ... } }
Key-Value Stores (Redis, DynamoDB)#
Key-value stores are the simplest and fastest NoSQL model. Everything revolves around key design.
Redis Patterns#
// Session store
SET session:abc123 '{"userId":"u1","role":"admin"}' EX 3600
// Leaderboard
ZADD leaderboard 9500 "player_1"
ZADD leaderboard 8700 "player_2"
ZREVRANGE leaderboard 0 9 // top 10
// Rate limiting
INCR ratelimit:user42:minute
EXPIRE ratelimit:user42:minute 60
DynamoDB Single-Table Design#
DynamoDB charges per table and per read/write. Single-table design puts all entities in one table using generic partition and sort keys:
PK | SK | Data
--------------------|---------------------|------------------
USER#u1 | PROFILE | {name, email}
USER#u1 | ORDER#o1 | {status, total}
USER#u1 | ORDER#o2 | {status, total}
ORDER#o1 | ITEM#i1 | {product, qty}
ORDER#o1 | ITEM#i2 | {product, qty}
PRODUCT#p1 | METADATA | {name, price}
Query "all orders for user u1":
PK = "USER#u1" AND begins_with(SK, "ORDER#")
Query "all items in order o1":
PK = "ORDER#o1" AND begins_with(SK, "ITEM#")
One table, zero joins, predictable latency at any scale.
GSI Overloading#
Use generic Global Secondary Indexes for alternate access patterns:
GSI1-PK | GSI1-SK | Use Case
--------------------|---------------------|------------------
STATUS#shipped | 2026-03-28#o1 | Orders by status + date
EMAIL#mo@co.io | USER | User lookup by email
Wide-Column Stores (Cassandra)#
Cassandra excels at high write throughput and time-series data. You design tables around queries — one table per query pattern.
-- Query: "Get all sensor readings for device X in the last hour"
CREATE TABLE sensor_readings (
device_id TEXT,
reading_ts TIMESTAMP,
temperature DOUBLE,
humidity DOUBLE,
PRIMARY KEY (device_id, reading_ts)
) WITH CLUSTERING ORDER BY (reading_ts DESC);
SELECT * FROM sensor_readings
WHERE device_id = 'sensor_42'
AND reading_ts > '2026-03-28T10:00:00Z';
Key rules:
- Partition key = how data is distributed (device_id)
- Clustering key = how data is sorted within a partition (reading_ts)
- No joins, no subqueries, no ad-hoc aggregations
- Duplicate data across tables to serve different queries
Graph Databases (Neo4j)#
When relationships are the data, graphs win. Social networks, recommendation engines, fraud detection, knowledge graphs.
// Create nodes and relationships
CREATE (alice:Person {name: "Alice"})
CREATE (bob:Person {name: "Bob"})
CREATE (neo4j:Technology {name: "Neo4j"})
CREATE (alice)-[:KNOWS]->(bob)
CREATE (alice)-[:USES]->(neo4j)
CREATE (bob)-[:USES]->(neo4j)
// "Friends of friends who use Neo4j"
MATCH (alice:Person {name: "Alice"})-[:KNOWS*2]-(fof)-[:USES]->(t:Technology {name: "Neo4j"})
RETURN DISTINCT fof.name
Graph queries that would require recursive CTEs or multiple joins in SQL become single, readable traversals.
Denormalization Patterns#
NoSQL databases embrace denormalization. Common patterns:
| Pattern | Description | Use Case |
|---|---|---|
| Embedding | Nest related data inside a document | Orders with line items |
| Duplication | Copy fields across documents | Product name in order items |
| Pre-aggregation | Store computed totals | Order count on user profile |
| Materialized views | Maintain query-optimized copies | Leaderboards, dashboards |
| Bucketing | Group time-series into chunks | Hourly sensor reading buckets |
The trade-off is always the same: faster reads, more complex writes.
When to Use Which#
| Database Type | Best For | Avoid When |
|---|---|---|
| Document (MongoDB) | Flexible schemas, content management, catalogs | Heavy cross-document transactions |
| Key-Value (Redis) | Caching, sessions, real-time leaderboards | Complex queries, relationships |
| Key-Value (DynamoDB) | Serverless, predictable scale, single-table | Ad-hoc analytics, unknown access patterns |
| Wide-Column (Cassandra) | Time-series, IoT, high write throughput | Low-latency reads on arbitrary columns |
| Graph (Neo4j) | Relationship-heavy queries, recommendations | Simple CRUD, tabular data |
Access-Pattern-Driven Design Checklist#
- List every access pattern before designing anything
- Identify the primary key that serves each pattern
- Decide embed vs reference for related data
- Plan for secondary indexes (GSIs, materialized views)
- Estimate item/document sizes to stay within limits
- Prototype with real queries — if a pattern requires a scan, redesign
Key Takeaways#
- Start with access patterns, not entities
- Denormalization is not a sin — it is the strategy
- Single-table design in DynamoDB eliminates joins and reduces cost
- Pick the database type that matches your dominant query shape
- Every NoSQL decision trades write complexity for read performance
Build and visualize your data models with codelit.io — the all-in-one workspace for engineering teams.
Article 161 on the Codelit engineering blog.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
Try these templates
Data Warehouse & Analytics
Snowflake-like data warehouse with ELT pipelines, SQL analytics, dashboards, and data governance.
8 componentsMongoDB Sharded Cluster
Horizontally scaled MongoDB with sharding, replica sets, config servers, and mongos query routing.
10 componentsGraphQL API Gateway
Federated GraphQL gateway aggregating multiple microservice schemas with caching, auth, and rate limiting.
10 componentsBuild this architecture
Generate an interactive architecture for NoSQL Data Modeling in seconds.
Try it in Codelit →
Comments