Database Backup Strategies: From Full Dumps to Point-in-Time Recovery
Backups are the one thing nobody thinks about until they need them. A solid database backup strategy is the difference between a minor incident and a career-defining disaster. This guide covers the techniques, tools, and planning you need to protect your data.
Full vs Incremental vs Differential Backups#
The three fundamental backup types trade off storage, speed, and recovery complexity.
Full Backups#
A full backup captures the entire database at a point in time. It is the simplest to restore — just load the single backup file.
- Pros: Simple, self-contained, fastest restore
- Cons: Slowest to create, largest storage footprint
- Use when: You have small databases or need a weekly baseline
Incremental Backups#
An incremental backup captures only the data that changed since the last backup (full or incremental). Each increment depends on every previous one in the chain.
- Pros: Fast to create, minimal storage per backup
- Cons: Restore requires replaying the full chain — slow and fragile
- Use when: You need frequent backups with minimal I/O impact
Differential Backups#
A differential backup captures everything that changed since the last full backup. It grows larger over time but only depends on the most recent full backup.
- Pros: Faster restore than incremental (full + one differential), moderate storage
- Cons: Larger than incremental, grows between full backups
- Use when: You want a balance between backup speed and restore speed
| Type | Backup Speed | Storage | Restore Speed | Chain Dependency |
|---|---|---|---|---|
| Full | Slow | Large | Fast | None |
| Incremental | Fast | Small | Slow | Full chain |
| Differential | Medium | Medium | Medium | Last full only |
Point-in-Time Recovery (PITR)#
PITR lets you restore a database to any specific moment — not just the time of the last backup. This is critical for recovering from accidental data corruption or deletion.
How it works:
- Start with a base backup (full snapshot of the database)
- Continuously archive transaction logs (WAL files in PostgreSQL, binlogs in MySQL)
- To recover: restore the base backup, then replay logs up to the target timestamp
PITR gives you a recovery window defined by your oldest base backup and your continuous log archive. If you keep 7 days of WAL files and daily base backups, you can recover to any second within that week.
WAL Archiving (PostgreSQL)#
Write-Ahead Logging (WAL) is how PostgreSQL ensures durability. Every change is written to WAL before it hits the data files. By archiving WAL segments, you enable PITR.
Setup essentials:
# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'cp %p /archive/%f'
The archive_command runs every time a WAL segment fills up (default 16 MB). In production, replace the cp command with a script that ships WAL to S3, GCS, or a dedicated backup server.
Tools for WAL archiving:
- pgBackRest — parallel backup and WAL archiving with compression and encryption
- Barman — backup and recovery manager with retention policies
- wal-g — cloud-native WAL archiving to S3, GCS, or Azure Blob Storage
Logical vs Physical Backups#
Logical Backups#
Logical backups export data as SQL statements or structured formats (CSV, custom). They are portable across PostgreSQL versions and can selectively dump specific tables.
- pg_dump — dumps a single database to SQL or custom format
- pg_dumpall — dumps all databases including roles and tablespaces
# Custom format (compressed, supports parallel restore)
pg_dump -Fc -f backup.dump mydb
# Restore with parallel workers
pg_restore -j 4 -d mydb backup.dump
Limitations: Slow for large databases (serializes data), no PITR support on its own, and the backup is not crash-consistent without careful locking.
Physical Backups#
Physical backups copy the raw data files on disk. They are faster for large databases and support PITR when combined with WAL archiving.
- pg_basebackup — streams a binary copy of the entire data directory
pg_basebackup -D /backups/base -Ft -z -P
Trade-offs: Not portable across major PostgreSQL versions, copies everything (no table-level granularity), but significantly faster for multi-terabyte databases.
| Aspect | Logical (pg_dump) | Physical (pg_basebackup) |
|---|---|---|
| Speed | Slower | Faster |
| Size | Smaller (compressed SQL) | Larger (raw files) |
| PITR support | No | Yes (with WAL) |
| Cross-version | Yes | No |
| Granularity | Per table | Whole cluster |
Automated Cloud Backups#
Managed database services handle backups automatically, but understanding what they do matters.
Amazon RDS#
- Automated backups: daily snapshots + transaction logs, retained for up to 35 days
- PITR: restore to any second within the retention window
- Manual snapshots: persist until you delete them, no retention limit
- Cross-region: copy snapshots to other regions for DR
Google Cloud SQL#
- Automated backups: daily, retained for 7 days by default (configurable up to 365)
- PITR: enabled via binary logging (MySQL) or WAL archiving (PostgreSQL)
- On-demand backups: manual snapshots with no automatic expiration
Azure Database for PostgreSQL#
- Automated backups: configurable retention from 7 to 35 days
- Geo-redundant backups: replicate to a paired region
- PITR: restore to any point within the retention period
All three providers encrypt backups at rest and handle WAL/binlog management transparently. The main risk is assuming the defaults are sufficient — always verify retention periods and test restores.
Backup Testing#
A backup you have never restored is not a backup. It is a hope.
Testing checklist:
- Schedule regular restore drills — monthly at minimum, quarterly is too rare
- Restore to a separate environment — never test on production
- Verify data integrity — run checksums, row counts, and application-level validation
- Measure restore time — this is your actual RTO, not the theoretical one
- Test PITR — restore to a specific timestamp and verify the data matches expectations
- Automate the test — script the restore and validation so it runs without human intervention
Document every test: date, backup used, restore duration, issues found. This log proves your DR capability during audits.
RPO and RTO Planning#
Recovery Point Objective (RPO) answers: how much data can you afford to lose? If RPO is 1 hour, you need backups at least every hour.
Recovery Time Objective (RTO) answers: how long can you be down? If RTO is 30 minutes, your restore process must complete within that window.
| RPO Target | Backup Strategy |
|---|---|
| Zero | Synchronous replication (not strictly a backup) |
| Minutes | Continuous WAL archiving + streaming replication |
| Hours | Frequent incremental or differential backups |
| Daily | Nightly full backups |
| RTO Target | Recovery Strategy |
|---|---|
| Minutes | Hot standby with automatic failover |
| Under 1 hour | Physical backups on fast storage + rehearsed restore |
| Hours | Logical backups or cloud snapshot restore |
The formula is simple: multiply the cost of downtime per hour by your RTO to get the budget you should invest in backup infrastructure. If an hour of downtime costs $50,000, spending $10,000/year on backup tooling is a bargain.
A Practical Backup Strategy#
For a production PostgreSQL database, a layered approach works best:
- Continuous WAL archiving to S3 via wal-g (RPO: seconds)
- Daily base backups via pg_basebackup or pgBackRest (PITR baseline)
- Weekly logical dumps via pg_dump for portability and schema versioning
- Streaming replica in another region for instant failover (RTO: minutes)
- Monthly restore tests automated in CI with data validation
This gives you multiple recovery paths: fast failover for infrastructure failures, PITR for data corruption, and logical dumps for selective table recovery.
That wraps up article #279 on Codelit. If you found this useful, explore our growing library of 279 articles covering system design, infrastructure, and software engineering — browse all posts here.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
Try these templates
Build this architecture
Generate an interactive architecture for Database Backup Strategies in seconds.
Try it in Codelit →
Comments