Database Schema Design Patterns — Normalization, Denormalization, and Beyond
Why schema design matters#
A poorly designed schema does not just slow queries — it creates data anomalies, makes migrations painful, and forces application code to compensate for structural flaws. Getting the schema right early saves months of rework.
Normalization — eliminating redundancy#
Normalization is the process of organizing columns and tables to reduce data duplication and prevent update anomalies.
First Normal Form (1NF)#
Every column holds atomic (indivisible) values, and every row is unique.
Violation: a phone_numbers column storing "555-1234, 555-5678" as a comma-separated string.
Fix: create a separate phone_numbers table with one row per number.
CREATE TABLE phone_numbers (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
phone_number VARCHAR(20) NOT NULL
);
Second Normal Form (2NF)#
The table is in 1NF, and every non-key column depends on the entire primary key, not just part of it.
Violation: a table with a composite key (order_id, product_id) that also stores product_name. The product name depends only on product_id, not the full key.
Fix: move product_name to a products table and reference it via product_id.
Third Normal Form (3NF)#
The table is in 2NF, and no non-key column depends on another non-key column (no transitive dependencies).
Violation: storing city and zip_code in an orders table. The city is determined by the zip code, not the order.
Fix: create an addresses table or a zip_codes lookup table.
Boyce-Codd Normal Form (BCNF)#
A stricter version of 3NF. Every determinant (column that uniquely determines another) must be a candidate key. BCNF eliminates edge cases where 3NF still allows redundancy in tables with overlapping candidate keys.
In practice, most schemas that satisfy 3NF also satisfy BCNF. The exceptions arise in tables with multiple overlapping composite keys — rare in typical application databases.
When to stop normalizing#
Normalize until it hurts reads, then selectively denormalize.
Over-normalized schemas require many JOINs for common queries. If your dashboard query joins seven tables to render a single card, normalization has gone too far for that access pattern.
Denormalization — trading redundancy for speed#
Denormalization deliberately introduces duplicate data to avoid expensive joins at read time.
Common denormalization strategies#
| Strategy | Example | Trade-off |
|---|---|---|
| Duplicated columns | Store customer_name in orders table | Stale if customer renames |
| Summary tables | Pre-computed daily_sales aggregate | Must be refreshed |
| Materialized views | Database-managed denormalized snapshot | Refresh cost and staleness |
| Computed columns | full_name generated from first + last | Minimal cost, limited flexibility |
Keeping denormalized data consistent#
- Application-level sync — update all copies in the same transaction
- Database triggers — automatically propagate changes (adds complexity)
- Async workers — eventual consistency via background jobs
- Materialized view refresh — periodic or on-demand rebuilds
Polymorphic associations#
A polymorphic association lets a single foreign key reference rows in multiple tables. Common in comment systems, activity feeds, and tagging.
CREATE TABLE comments (
id SERIAL PRIMARY KEY,
commentable_type VARCHAR(50) NOT NULL, -- 'post', 'video', 'photo'
commentable_id INTEGER NOT NULL,
body TEXT NOT NULL
);
Pros: flexible, simple to add new commentable types.
Cons: no foreign key constraint (the database cannot enforce referential integrity across multiple tables), queries require conditional joins.
Alternative — shared parent table:
CREATE TABLE commentables (
id SERIAL PRIMARY KEY,
type VARCHAR(50) NOT NULL
);
CREATE TABLE posts (
id INTEGER PRIMARY KEY REFERENCES commentables(id),
title TEXT
);
CREATE TABLE comments (
id SERIAL PRIMARY KEY,
commentable_id INTEGER REFERENCES commentables(id),
body TEXT NOT NULL
);
This restores foreign key integrity at the cost of an extra table and insert.
Entity-Attribute-Value (EAV) pattern#
EAV stores data as (entity_id, attribute_name, attribute_value) triples instead of fixed columns.
CREATE TABLE product_attributes (
product_id INTEGER REFERENCES products(id),
attribute_name VARCHAR(100),
attribute_value TEXT,
PRIMARY KEY (product_id, attribute_name)
);
When EAV makes sense:
- Highly variable schemas (e-commerce products with hundreds of possible attributes)
- User-defined custom fields
- Configuration storage
When to avoid EAV:
- You need type safety (all values are stored as text)
- You need efficient filtering or aggregation on specific attributes
- Query complexity becomes unmanageable (pivoting rows to columns)
JSON columns — the modern middle ground#
PostgreSQL's jsonb and MySQL's JSON type offer schema flexibility without the query pain of EAV.
ALTER TABLE products ADD COLUMN metadata JSONB DEFAULT '{}';
-- Query a nested field
SELECT * FROM products
WHERE metadata ->> 'color' = 'blue';
-- Index for performance
CREATE INDEX idx_products_color ON products ((metadata ->> 'color'));
Advantages over EAV:
- Native JSON operators simplify queries
- GIN indexes enable fast lookups on nested fields
- Type coercion is handled at query time
When to prefer fixed columns: if you query or filter on a field frequently and its presence is guaranteed, a dedicated column with a proper type and index will always outperform a JSON path.
Soft deletes#
Instead of removing rows with DELETE, mark them as deleted with a flag.
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMPTZ DEFAULT NULL;
-- "Delete" a user
UPDATE users SET deleted_at = NOW() WHERE id = 42;
-- Query only active users
SELECT * FROM users WHERE deleted_at IS NULL;
Benefits:
- Audit trail — you can see what was deleted and when
- Easy undo — clear the
deleted_atcolumn to restore - Referential safety — foreign keys remain valid
Pitfalls:
- Every query must filter
WHERE deleted_at IS NULL(use a default scope or view) - Unique constraints must be partial:
CREATE UNIQUE INDEX ... WHERE deleted_at IS NULL - Storage grows over time — implement a periodic hard-delete job for old soft-deleted rows
Audit columns#
Track who changed what and when by adding standard metadata columns to every table.
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
status VARCHAR(20) NOT NULL,
total_cents INTEGER NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by INTEGER REFERENCES users(id),
updated_by INTEGER REFERENCES users(id)
);
Automating updates:
- Use a database trigger to set
updated_at = NOW()on every update - Pass the acting user ID through application context or
SET LOCALvariables
For full change history (not just the latest modifier), use an audit log table or temporal tables.
Temporal tables#
Temporal tables maintain a complete history of every row version, enabling point-in-time queries.
System-versioned temporal tables (SQL:2011)#
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name VARCHAR(100),
salary INTEGER,
valid_from TIMESTAMPTZ GENERATED ALWAYS AS ROW START,
valid_to TIMESTAMPTZ GENERATED ALWAYS AS ROW END,
PERIOD FOR SYSTEM_TIME (valid_from, valid_to)
) WITH SYSTEM VERSIONING;
-- Query current state
SELECT * FROM employees WHERE id = 1;
-- Query historical state
SELECT * FROM employees FOR SYSTEM_TIME AS OF '2025-06-01' WHERE id = 1;
Application-managed history tables#
If your database does not support SQL:2011 temporal tables, implement the pattern manually:
CREATE TABLE employees_history (
id INTEGER,
name VARCHAR(100),
salary INTEGER,
valid_from TIMESTAMPTZ NOT NULL,
valid_to TIMESTAMPTZ NOT NULL,
operation CHAR(1) NOT NULL -- 'I', 'U', 'D'
);
A trigger on the employees table copies the old row to employees_history before every update or delete.
Use cases: compliance and auditing, debugging data issues, building time-travel features, regulatory requirements (financial services, healthcare).
Choosing the right pattern#
| Requirement | Recommended pattern |
|---|---|
| Data integrity and consistency | Normalize to 3NF/BCNF |
| Read-heavy dashboards | Denormalize with materialized views |
| Variable attributes per entity | JSON columns (or EAV if extreme variability) |
| Comments/tags across entity types | Polymorphic with shared parent table |
| Undo and audit trail | Soft deletes + audit columns |
| Regulatory compliance | Temporal tables |
Visualize your schema on Codelit#
Generate an ER diagram for your schema on Codelit to see normalization levels, foreign key relationships, and denormalization points in an interactive visual. Click on any table to explore its design trade-offs.
This is article #311 in the Codelit engineering blog series.
Build and explore database architectures visually at codelit.io.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
API Backward Compatibility: Ship Changes Without Breaking Consumers
6 min read
api designBatch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency
8 min read
system designCircuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j
7 min read
Try these templates
Build this architecture
Generate an interactive architecture for Database Schema Design Patterns in seconds.
Try it in Codelit →
Comments