Soft Delete vs Hard Delete: Patterns, Trade-offs, and GDPR Compliance
Soft Delete vs Hard Delete#
Deleting data sounds simple. It isn't. The moment you run DELETE FROM users WHERE id = 42, that row is gone forever. No audit trail, no recovery, no compliance proof. Soft delete offers an alternative — but it comes with its own costs.
Hard Delete#
A hard delete permanently removes the row from the database.
DELETE FROM users WHERE id = 42;
Pros:
- Simple and fast
- No phantom data in queries
- Storage is reclaimed immediately
- No ambiguity about what exists
Cons:
- No undo capability
- No audit trail
- Cascading deletes can be destructive
- Breaks referential integrity if foreign keys aren't handled
Soft Delete#
A soft delete marks the row as deleted without removing it.
UPDATE users SET deleted_at = NOW() WHERE id = 42;
Pros:
- Full undo capability
- Audit trail preserved
- Referential integrity maintained
- Compliance-friendly (you can prove what existed)
Cons:
- Every query needs a filter
- Storage grows indefinitely
- Indexes include dead rows
- Unique constraints become complicated
Implementation Patterns#
Pattern 1: deleted_at Column#
The most common approach. Add a nullable timestamp column:
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP NULL DEFAULT NULL;
-- Soft delete
UPDATE users SET deleted_at = NOW() WHERE id = 42;
-- Query only active records
SELECT * FROM users WHERE deleted_at IS NULL;
-- Restore
UPDATE users SET deleted_at = NULL WHERE id = 42;
Tip: Create a partial index for performance:
CREATE INDEX idx_users_active ON users (id) WHERE deleted_at IS NULL;
This way queries filtering on active records use a smaller, faster index.
Pattern 2: Status Column#
Use an enum instead of a timestamp:
ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active';
-- Possible values: 'active', 'deleted', 'suspended', 'archived'
UPDATE users SET status = 'deleted' WHERE id = 42;
This is more flexible — you can represent multiple states beyond just "exists" and "deleted."
Pattern 3: Archive Table#
Move deleted rows to a separate table:
-- Archive the row
INSERT INTO users_archive SELECT *, NOW() as archived_at FROM users WHERE id = 42;
DELETE FROM users WHERE id = 42;
-- Restore
INSERT INTO users SELECT id, name, email, created_at FROM users_archive WHERE id = 42;
DELETE FROM users_archive WHERE id = 42;
Pros:
- Active table stays clean and fast
- No WHERE clause pollution
- Unique constraints work normally
Cons:
- Schema must stay in sync between tables
- Restore is a multi-step operation
- Foreign keys complicate the move
Pattern 4: Event Sourcing#
Instead of mutating state, record every event including deletion:
Event 1: UserCreated { id: 42, name: "Alice", email: "alice@example.com" }
Event 2: UserUpdated { id: 42, email: "alice@newdomain.com" }
Event 3: UserDeleted { id: 42, reason: "account_closure" }
The current state is derived by replaying events. "Deleting" means appending a deletion event. You can reconstruct any point in time.
Best for: Financial systems, audit-heavy domains, systems requiring full history.
Cascading Deletes#
When a parent record is soft-deleted, what happens to children?
Strategy 1: Cascade the Soft Delete#
-- Delete user and all their posts
UPDATE users SET deleted_at = NOW() WHERE id = 42;
UPDATE posts SET deleted_at = NOW() WHERE user_id = 42;
UPDATE comments SET deleted_at = NOW() WHERE user_id = 42;
Problem: restoring the user means restoring everything — including items the user may have deleted intentionally before account deletion.
Strategy 2: Lazy Filtering#
Only soft-delete the parent. Filter children at query time:
SELECT posts.*
FROM posts
JOIN users ON users.id = posts.user_id
WHERE users.deleted_at IS NULL
AND posts.deleted_at IS NULL;
Problem: every query involving relationships needs multiple soft-delete checks.
Strategy 3: Application-Level Rules#
Define explicit rules per relationship:
| Parent | Child | On Parent Delete |
|---|---|---|
| User | Posts | Cascade soft delete |
| User | Comments | Cascade soft delete |
| User | Payment history | Keep (compliance) |
| Organization | Users | Block deletion if users exist |
GDPR Right to Erasure#
Article 17 of the GDPR grants users the "right to be forgotten." This creates a direct tension with soft delete.
The Problem#
Soft delete keeps the data. GDPR says delete it. These goals conflict.
The Solution: Separate PII from Records#
-- Step 1: Anonymize personal data
UPDATE users
SET name = 'REDACTED',
email = 'redacted-42@example.com',
phone = NULL,
address = NULL,
deleted_at = NOW()
WHERE id = 42;
-- Step 2: Keep non-PII for business records
-- Order history, analytics, etc. retain the user_id
-- but the user record no longer contains personal data
GDPR-Compliant Deletion Checklist#
- Identify all PII across every table and service
- Anonymize or purge PII — not just the primary record
- Retain non-PII where you have a legitimate business interest
- Document the process for audit purposes
- Handle backups — PII in backups must also be addressed
- Propagate to third parties — if you shared data with vendors, notify them
Backup Considerations#
Soft-deleted and anonymized data still lives in database backups. Options:
- Accept that backups contain PII (with documented retention policy)
- Encrypt backups with keys that can be destroyed
- Use backup solutions that support selective purging
Query Performance Impact#
Soft delete affects performance in several ways:
Index Bloat#
Dead rows still occupy index space:
-- Without soft delete: index has 1M rows
-- With soft delete: index has 1M active + 500K deleted = 1.5M rows
-- Fix: partial index
CREATE INDEX idx_users_email_active
ON users (email)
WHERE deleted_at IS NULL;
Query Complexity#
Every query gains a filter:
-- Before soft delete
SELECT * FROM users WHERE email = 'alice@example.com';
-- After soft delete
SELECT * FROM users WHERE email = 'alice@example.com' AND deleted_at IS NULL;
In an ORM, use global scopes or default scopes:
# SQLAlchemy example
class User(Base):
__tablename__ = 'users'
@staticmethod
def default_filter():
return User.deleted_at.is_(None)
Unique Constraint Issues#
Soft-deleted rows still enforce uniqueness:
-- User deletes account with email alice@example.com
-- New user tries to register with the same email
-- UNIQUE constraint violation!
-- Fix: partial unique index
CREATE UNIQUE INDEX idx_users_email_unique
ON users (email)
WHERE deleted_at IS NULL;
Cleanup Strategies#
Soft-deleted data should not live forever. Implement scheduled cleanup.
Time-Based Purge#
-- Hard delete records soft-deleted more than 90 days ago
DELETE FROM users
WHERE deleted_at IS NOT NULL
AND deleted_at < NOW() - INTERVAL '90 days';
Batch Cleanup Job#
def cleanup_soft_deleted(table, retention_days=90, batch_size=1000):
cutoff = datetime.now() - timedelta(days=retention_days)
while True:
deleted = db.execute(
f"DELETE FROM {table} "
f"WHERE deleted_at IS NOT NULL "
f"AND deleted_at < %s "
f"LIMIT %s",
(cutoff, batch_size)
)
if deleted.rowcount == 0:
break
db.commit()
Retention Policy Table#
| Data Type | Soft Delete Retention | Then |
|---|---|---|
| User accounts | 90 days | Hard delete + anonymize |
| Posts / content | 30 days | Hard delete |
| Financial records | 7 years | Archive to cold storage |
| Audit logs | Never soft delete | Retain permanently |
When to Use Which#
| Scenario | Recommendation |
|---|---|
| User-facing content (posts, comments) | Soft delete with cleanup |
| Financial / legal records | Event sourcing or archive table |
| Transient data (sessions, caches) | Hard delete |
| GDPR-regulated PII | Anonymize + soft delete |
| High-write, high-volume tables | Hard delete or archive table |
| Multi-tenant SaaS | Soft delete with tenant-scoped cleanup |
This is article #355 in the Codelit engineering series. Explore more at codelit.io.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
AI Architecture Review
Get an AI audit covering security gaps, bottlenecks, and scaling risks
Related articles
AI Agent Tool Use Architecture: Function Calling, ReAct Loops & Structured Outputs
6 min read
AI searchAI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG
8 min read
AI safetyAI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop
8 min read
Try these templates
Netflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsSearch Engine Architecture
Web-scale search with crawling, indexing, ranking, and sub-second query serving.
8 componentsGoogle Search Engine Architecture
Web-scale search with crawling, indexing, PageRank, query processing, ads, and knowledge graph.
10 componentsBuild this architecture
Generate an interactive architecture for Soft Delete vs Hard Delete in seconds.
Try it in Codelit →
Comments