databasessystem-designfundamentalsbackend

Database Schema Design Patterns — Normalization, Denormalization, and Beyond

March 29, 2026 7 min readBy Codelit Team Discussion

Why schema design matters#

A poorly designed schema does not just slow queries — it creates data anomalies, makes migrations painful, and forces application code to compensate for structural flaws. Getting the schema right early saves months of rework.

Normalization — eliminating redundancy#

Normalization is the process of organizing columns and tables to reduce data duplication and prevent update anomalies.

First Normal Form (1NF)#

Every column holds atomic (indivisible) values, and every row is unique.

Violation: a phone_numbers column storing "555-1234, 555-5678" as a comma-separated string.

Fix: create a separate phone_numbers table with one row per number.

CREATE TABLE phone_numbers (
  id SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  phone_number VARCHAR(20) NOT NULL
);

Second Normal Form (2NF)#

The table is in 1NF, and every non-key column depends on the entire primary key, not just part of it.

Violation: a table with a composite key (order_id, product_id) that also stores product_name. The product name depends only on product_id, not the full key.

Fix: move product_name to a products table and reference it via product_id.

Third Normal Form (3NF)#

The table is in 2NF, and no non-key column depends on another non-key column (no transitive dependencies).

Violation: storing city and zip_code in an orders table. The city is determined by the zip code, not the order.

Fix: create an addresses table or a zip_codes lookup table.

Boyce-Codd Normal Form (BCNF)#

A stricter version of 3NF. Every determinant (column that uniquely determines another) must be a candidate key. BCNF eliminates edge cases where 3NF still allows redundancy in tables with overlapping candidate keys.

In practice, most schemas that satisfy 3NF also satisfy BCNF. The exceptions arise in tables with multiple overlapping composite keys — rare in typical application databases.

When to stop normalizing#

Normalize until it hurts reads, then selectively denormalize.

Over-normalized schemas require many JOINs for common queries. If your dashboard query joins seven tables to render a single card, normalization has gone too far for that access pattern.

Denormalization — trading redundancy for speed#

Denormalization deliberately introduces duplicate data to avoid expensive joins at read time.

Common denormalization strategies#

Strategy	Example	Trade-off
Duplicated columns	Store `customer_name` in `orders` table	Stale if customer renames
Summary tables	Pre-computed `daily_sales` aggregate	Must be refreshed
Materialized views	Database-managed denormalized snapshot	Refresh cost and staleness
Computed columns	`full_name` generated from `first` + `last`	Minimal cost, limited flexibility

Keeping denormalized data consistent#

Application-level sync — update all copies in the same transaction
Database triggers — automatically propagate changes (adds complexity)
Async workers — eventual consistency via background jobs
Materialized view refresh — periodic or on-demand rebuilds

Polymorphic associations#

A polymorphic association lets a single foreign key reference rows in multiple tables. Common in comment systems, activity feeds, and tagging.

CREATE TABLE comments (
  id SERIAL PRIMARY KEY,
  commentable_type VARCHAR(50) NOT NULL,  -- 'post', 'video', 'photo'
  commentable_id INTEGER NOT NULL,
  body TEXT NOT NULL
);

Pros: flexible, simple to add new commentable types.

Cons: no foreign key constraint (the database cannot enforce referential integrity across multiple tables), queries require conditional joins.

Alternative — shared parent table:

CREATE TABLE commentables (
  id SERIAL PRIMARY KEY,
  type VARCHAR(50) NOT NULL
);

CREATE TABLE posts (
  id INTEGER PRIMARY KEY REFERENCES commentables(id),
  title TEXT
);

CREATE TABLE comments (
  id SERIAL PRIMARY KEY,
  commentable_id INTEGER REFERENCES commentables(id),
  body TEXT NOT NULL
);

This restores foreign key integrity at the cost of an extra table and insert.

Entity-Attribute-Value (EAV) pattern#

EAV stores data as (entity_id, attribute_name, attribute_value) triples instead of fixed columns.

CREATE TABLE product_attributes (
  product_id INTEGER REFERENCES products(id),
  attribute_name VARCHAR(100),
  attribute_value TEXT,
  PRIMARY KEY (product_id, attribute_name)
);

When EAV makes sense:

Highly variable schemas (e-commerce products with hundreds of possible attributes)
User-defined custom fields
Configuration storage

When to avoid EAV:

You need type safety (all values are stored as text)
You need efficient filtering or aggregation on specific attributes
Query complexity becomes unmanageable (pivoting rows to columns)

JSON columns — the modern middle ground#

PostgreSQL's jsonb and MySQL's JSON type offer schema flexibility without the query pain of EAV.

ALTER TABLE products ADD COLUMN metadata JSONB DEFAULT '{}';

-- Query a nested field
SELECT * FROM products
WHERE metadata ->> 'color' = 'blue';

-- Index for performance
CREATE INDEX idx_products_color ON products ((metadata ->> 'color'));

Advantages over EAV:

Native JSON operators simplify queries
GIN indexes enable fast lookups on nested fields
Type coercion is handled at query time

When to prefer fixed columns: if you query or filter on a field frequently and its presence is guaranteed, a dedicated column with a proper type and index will always outperform a JSON path.

Soft deletes#

Instead of removing rows with DELETE, mark them as deleted with a flag.

ALTER TABLE users ADD COLUMN deleted_at TIMESTAMPTZ DEFAULT NULL;

-- "Delete" a user
UPDATE users SET deleted_at = NOW() WHERE id = 42;

-- Query only active users
SELECT * FROM users WHERE deleted_at IS NULL;

Benefits:

Audit trail — you can see what was deleted and when
Easy undo — clear the deleted_at column to restore
Referential safety — foreign keys remain valid

Pitfalls:

Every query must filter WHERE deleted_at IS NULL (use a default scope or view)
Unique constraints must be partial: CREATE UNIQUE INDEX ... WHERE deleted_at IS NULL
Storage grows over time — implement a periodic hard-delete job for old soft-deleted rows

Audit columns#

Track who changed what and when by adding standard metadata columns to every table.

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  status VARCHAR(20) NOT NULL,
  total_cents INTEGER NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  created_by INTEGER REFERENCES users(id),
  updated_by INTEGER REFERENCES users(id)
);

Automating updates:

Use a database trigger to set updated_at = NOW() on every update
Pass the acting user ID through application context or SET LOCAL variables

For full change history (not just the latest modifier), use an audit log table or temporal tables.

Temporal tables#

Temporal tables maintain a complete history of every row version, enabling point-in-time queries.

System-versioned temporal tables (SQL:2011)#

CREATE TABLE employees (
  id INTEGER PRIMARY KEY,
  name VARCHAR(100),
  salary INTEGER,
  valid_from TIMESTAMPTZ GENERATED ALWAYS AS ROW START,
  valid_to TIMESTAMPTZ GENERATED ALWAYS AS ROW END,
  PERIOD FOR SYSTEM_TIME (valid_from, valid_to)
) WITH SYSTEM VERSIONING;

-- Query current state
SELECT * FROM employees WHERE id = 1;

-- Query historical state
SELECT * FROM employees FOR SYSTEM_TIME AS OF '2025-06-01' WHERE id = 1;

Application-managed history tables#

If your database does not support SQL:2011 temporal tables, implement the pattern manually:

CREATE TABLE employees_history (
  id INTEGER,
  name VARCHAR(100),
  salary INTEGER,
  valid_from TIMESTAMPTZ NOT NULL,
  valid_to TIMESTAMPTZ NOT NULL,
  operation CHAR(1) NOT NULL  -- 'I', 'U', 'D'
);

A trigger on the employees table copies the old row to employees_history before every update or delete.

Use cases: compliance and auditing, debugging data issues, building time-travel features, regulatory requirements (financial services, healthcare).

Choosing the right pattern#

Requirement	Recommended pattern
Data integrity and consistency	Normalize to 3NF/BCNF
Read-heavy dashboards	Denormalize with materialized views
Variable attributes per entity	JSON columns (or EAV if extreme variability)
Comments/tags across entity types	Polymorphic with shared parent table
Undo and audit trail	Soft deletes + audit columns
Regulatory compliance	Temporal tables

Visualize your schema on Codelit#

Generate an ER diagram for your schema on Codelit to see normalization levels, foreign key relationships, and denormalization points in an interactive visual. Click on any table to explore its design trade-offs.

This is article #311 in the Codelit engineering blog series.

Build and explore database architectures visually at codelit.io.

{ }

Explore the WhatsApp architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

Try these templates

Multiplayer Game Backend

Real-time multiplayer game server with matchmaking, state sync, leaderboards, and anti-cheat.

8 components

Figma Collaborative Design Platform

Browser-based design tool with real-time multiplayer editing, component libraries, and developer handoff.

10 components

Build this architecture

Generate an interactive architecture for Database Schema Design Patterns in seconds.

Try it in Codelit →

databasessystem-designfundamentalsbackend

Database Schema Design Patterns — Normalization, Denormalization, and Beyond

March 29, 2026 7 min readBy Codelit Team Discussion

Why schema design matters#

Normalization — eliminating redundancy#

Normalization is the process of organizing columns and tables to reduce data duplication and prevent update anomalies.

First Normal Form (1NF)#

Every column holds atomic (indivisible) values, and every row is unique.

Violation: a phone_numbers column storing "555-1234, 555-5678" as a comma-separated string.

Fix: create a separate phone_numbers table with one row per number.

CREATE TABLE phone_numbers (
  id SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  phone_number VARCHAR(20) NOT NULL
);

Second Normal Form (2NF)#

The table is in 1NF, and every non-key column depends on the entire primary key, not just part of it.

Violation: a table with a composite key (order_id, product_id) that also stores product_name. The product name depends only on product_id, not the full key.

Fix: move product_name to a products table and reference it via product_id.

Third Normal Form (3NF)#

The table is in 2NF, and no non-key column depends on another non-key column (no transitive dependencies).

Violation: storing city and zip_code in an orders table. The city is determined by the zip code, not the order.

Fix: create an addresses table or a zip_codes lookup table.

Boyce-Codd Normal Form (BCNF)#

In practice, most schemas that satisfy 3NF also satisfy BCNF. The exceptions arise in tables with multiple overlapping composite keys — rare in typical application databases.

When to stop normalizing#

Normalize until it hurts reads, then selectively denormalize.

Over-normalized schemas require many JOINs for common queries. If your dashboard query joins seven tables to render a single card, normalization has gone too far for that access pattern.

Denormalization — trading redundancy for speed#

Denormalization deliberately introduces duplicate data to avoid expensive joins at read time.

Common denormalization strategies#

Strategy	Example	Trade-off
Duplicated columns	Store `customer_name` in `orders` table	Stale if customer renames
Summary tables	Pre-computed `daily_sales` aggregate	Must be refreshed
Materialized views	Database-managed denormalized snapshot	Refresh cost and staleness
Computed columns	`full_name` generated from `first` + `last`	Minimal cost, limited flexibility

Keeping denormalized data consistent#

Application-level sync — update all copies in the same transaction
Database triggers — automatically propagate changes (adds complexity)
Async workers — eventual consistency via background jobs
Materialized view refresh — periodic or on-demand rebuilds

Polymorphic associations#

A polymorphic association lets a single foreign key reference rows in multiple tables. Common in comment systems, activity feeds, and tagging.

CREATE TABLE comments (
  id SERIAL PRIMARY KEY,
  commentable_type VARCHAR(50) NOT NULL,  -- 'post', 'video', 'photo'
  commentable_id INTEGER NOT NULL,
  body TEXT NOT NULL
);

Pros: flexible, simple to add new commentable types.

Cons: no foreign key constraint (the database cannot enforce referential integrity across multiple tables), queries require conditional joins.

Alternative — shared parent table:

CREATE TABLE commentables (
  id SERIAL PRIMARY KEY,
  type VARCHAR(50) NOT NULL
);

CREATE TABLE posts (
  id INTEGER PRIMARY KEY REFERENCES commentables(id),
  title TEXT
);

CREATE TABLE comments (
  id SERIAL PRIMARY KEY,
  commentable_id INTEGER REFERENCES commentables(id),
  body TEXT NOT NULL
);

This restores foreign key integrity at the cost of an extra table and insert.

Entity-Attribute-Value (EAV) pattern#

EAV stores data as (entity_id, attribute_name, attribute_value) triples instead of fixed columns.

CREATE TABLE product_attributes (
  product_id INTEGER REFERENCES products(id),
  attribute_name VARCHAR(100),
  attribute_value TEXT,
  PRIMARY KEY (product_id, attribute_name)
);

When EAV makes sense:

Highly variable schemas (e-commerce products with hundreds of possible attributes)
User-defined custom fields
Configuration storage

When to avoid EAV:

You need type safety (all values are stored as text)
You need efficient filtering or aggregation on specific attributes
Query complexity becomes unmanageable (pivoting rows to columns)

JSON columns — the modern middle ground#

PostgreSQL's jsonb and MySQL's JSON type offer schema flexibility without the query pain of EAV.

ALTER TABLE products ADD COLUMN metadata JSONB DEFAULT '{}';

-- Query a nested field
SELECT * FROM products
WHERE metadata ->> 'color' = 'blue';

-- Index for performance
CREATE INDEX idx_products_color ON products ((metadata ->> 'color'));

Advantages over EAV:

Native JSON operators simplify queries
GIN indexes enable fast lookups on nested fields
Type coercion is handled at query time

When to prefer fixed columns: if you query or filter on a field frequently and its presence is guaranteed, a dedicated column with a proper type and index will always outperform a JSON path.

Soft deletes#

Instead of removing rows with DELETE, mark them as deleted with a flag.

ALTER TABLE users ADD COLUMN deleted_at TIMESTAMPTZ DEFAULT NULL;

-- "Delete" a user
UPDATE users SET deleted_at = NOW() WHERE id = 42;

-- Query only active users
SELECT * FROM users WHERE deleted_at IS NULL;

Benefits:

Audit trail — you can see what was deleted and when
Easy undo — clear the deleted_at column to restore
Referential safety — foreign keys remain valid

Pitfalls:

Every query must filter WHERE deleted_at IS NULL (use a default scope or view)
Unique constraints must be partial: CREATE UNIQUE INDEX ... WHERE deleted_at IS NULL
Storage grows over time — implement a periodic hard-delete job for old soft-deleted rows

Audit columns#

Track who changed what and when by adding standard metadata columns to every table.

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  status VARCHAR(20) NOT NULL,
  total_cents INTEGER NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  created_by INTEGER REFERENCES users(id),
  updated_by INTEGER REFERENCES users(id)
);

Automating updates:

Use a database trigger to set updated_at = NOW() on every update
Pass the acting user ID through application context or SET LOCAL variables

For full change history (not just the latest modifier), use an audit log table or temporal tables.

Temporal tables#

Temporal tables maintain a complete history of every row version, enabling point-in-time queries.

System-versioned temporal tables (SQL:2011)#

CREATE TABLE employees (
  id INTEGER PRIMARY KEY,
  name VARCHAR(100),
  salary INTEGER,
  valid_from TIMESTAMPTZ GENERATED ALWAYS AS ROW START,
  valid_to TIMESTAMPTZ GENERATED ALWAYS AS ROW END,
  PERIOD FOR SYSTEM_TIME (valid_from, valid_to)
) WITH SYSTEM VERSIONING;

-- Query current state
SELECT * FROM employees WHERE id = 1;

-- Query historical state
SELECT * FROM employees FOR SYSTEM_TIME AS OF '2025-06-01' WHERE id = 1;

Application-managed history tables#

If your database does not support SQL:2011 temporal tables, implement the pattern manually:

CREATE TABLE employees_history (
  id INTEGER,
  name VARCHAR(100),
  salary INTEGER,
  valid_from TIMESTAMPTZ NOT NULL,
  valid_to TIMESTAMPTZ NOT NULL,
  operation CHAR(1) NOT NULL  -- 'I', 'U', 'D'
);

A trigger on the employees table copies the old row to employees_history before every update or delete.

Use cases: compliance and auditing, debugging data issues, building time-travel features, regulatory requirements (financial services, healthcare).

Choosing the right pattern#

Requirement	Recommended pattern
Data integrity and consistency	Normalize to 3NF/BCNF
Read-heavy dashboards	Denormalize with materialized views
Variable attributes per entity	JSON columns (or EAV if extreme variability)
Comments/tags across entity types	Polymorphic with shared parent table
Undo and audit trail	Soft deletes + audit columns
Regulatory compliance	Temporal tables

Visualize your schema on Codelit#

This is article #311 in the Codelit engineering blog series.

Build and explore database architectures visually at codelit.io.

{ }

Explore the WhatsApp architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

API design

Try these templates

Multiplayer Game Backend

Real-time multiplayer game server with matchmaking, state sync, leaderboards, and anti-cheat.

8 components

Figma Collaborative Design Platform

Browser-based design tool with real-time multiplayer editing, component libraries, and developer handoff.

10 components

Build this architecture

Generate an interactive architecture for Database Schema Design Patterns in seconds.

Try it in Codelit →

Database Schema Design Patterns — Normalization, Denormalization, and Beyond

Why schema design matters#

Normalization — eliminating redundancy#

First Normal Form (1NF)#

Second Normal Form (2NF)#

Third Normal Form (3NF)#

Boyce-Codd Normal Form (BCNF)#

When to stop normalizing#

Denormalization — trading redundancy for speed#

Common denormalization strategies#

Keeping denormalized data consistent#

Polymorphic associations#

Entity-Attribute-Value (EAV) pattern#

JSON columns — the modern middle ground#

Soft deletes#

Audit columns#

Temporal tables#

System-versioned temporal tables (SQL:2011)#

Application-managed history tables#

Choosing the right pattern#

Visualize your schema on Codelit#

Comments

Related articles

API Backward Compatibility: Ship Changes Without Breaking Consumers

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

Try these templates

Multiplayer Game Backend

Figma Collaborative Design Platform

Build this architecture

Database Schema Design Patterns — Normalization, Denormalization, and Beyond

Why schema design matters#

Normalization — eliminating redundancy#

First Normal Form (1NF)#

Second Normal Form (2NF)#

Third Normal Form (3NF)#

Boyce-Codd Normal Form (BCNF)#

When to stop normalizing#

Denormalization — trading redundancy for speed#

Common denormalization strategies#

Keeping denormalized data consistent#

Polymorphic associations#

Entity-Attribute-Value (EAV) pattern#

JSON columns — the modern middle ground#

Soft deletes#

Audit columns#

Temporal tables#

System-versioned temporal tables (SQL:2011)#

Application-managed history tables#

Choosing the right pattern#

Visualize your schema on Codelit#

Comments

Related articles

API Backward Compatibility: Ship Changes Without Breaking Consumers

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

Try these templates

Multiplayer Game Backend

Figma Collaborative Design Platform

Build this architecture