Database Vacuum & Maintenance: Keeping PostgreSQL and MySQL Healthy
Databases do not stay fast on their own. Every UPDATE and DELETE in PostgreSQL leaves behind dead tuples. Every variable-length row change in MySQL/InnoDB can fragment pages. Without regular maintenance the query planner works with stale statistics, indexes bloat, and storage grows far beyond the live data set. This guide covers the maintenance primitives you need and how to automate them.
Why Dead Tuples Accumulate in PostgreSQL#
PostgreSQL uses Multi-Version Concurrency Control (MVCC). When you update a row, the old version is not removed immediately — it stays on disk so that concurrent transactions can still see it. Once no transaction needs the old version, it becomes a dead tuple.
Dead tuples cause two problems:
- Table bloat — The table file grows because dead tuples occupy space that cannot be reused until vacuumed.
- Index bloat — Every index entry still points to the dead tuple, increasing index scan time and memory usage.
VACUUM Fundamentals#
PostgreSQL provides two vacuum modes:
-- Standard VACUUM: reclaims dead tuple space for reuse within the table
VACUUM my_table;
-- VACUUM FULL: rewrites the entire table to reclaim space to the OS
-- Acquires an ACCESS EXCLUSIVE lock — avoid in production
VACUUM FULL my_table;
Standard VACUUM marks dead tuple space as reusable but does not shrink the file on disk. VACUUM FULL does shrink it, but locks the table for the entire duration.
VACUUM ANALYZE#
Combine vacuuming with statistics collection:
VACUUM ANALYZE my_table;
Fresh statistics let the query planner choose better join strategies and index scans. Always run ANALYZE after bulk loads or large deletes.
Autovacuum: The Background Worker#
PostgreSQL ships with autovacuum enabled by default. It launches worker processes that vacuum and analyze tables automatically based on thresholds.
Key Parameters#
autovacuum_vacuum_threshold = 50 -- minimum dead tuples before vacuum
autovacuum_vacuum_scale_factor = 0.2 -- fraction of live tuples added to threshold
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.1
autovacuum_naptime = 1min -- how often the launcher checks tables
autovacuum_max_workers = 3
A table is vacuumed when:
dead_tuples > threshold + (scale_factor * live_tuples)
For a 10-million-row table with defaults, that means autovacuum fires after 2,000,050 dead tuples — far too late for write-heavy workloads.
Tuning for Large Tables#
Set per-table overrides for hot tables:
ALTER TABLE events SET (
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_vacuum_threshold = 1000,
autovacuum_vacuum_cost_delay = 2
);
Reducing cost_delay lets the worker consume more I/O budget, finishing faster at the cost of slightly higher background load.
Detecting Bloat#
pg_stat_user_tables#
SELECT relname,
n_dead_tup,
n_live_tup,
round(n_dead_tup * 100.0 / nullif(n_live_tup, 0), 1) AS dead_pct,
last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;
pgstattuple Extension#
CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT * FROM pgstattuple('my_table');
-- Returns dead_tuple_count, dead_tuple_len, free_space, table_len
A dead_tuple_len / table_len ratio above 20 % signals significant bloat.
Index Bloat#
SELECT indexrelname,
pg_relation_size(indexrelid) AS index_size,
idx_scan
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;
Large indexes with low scan counts are candidates for a REINDEX or drop.
REINDEX and pg_repack#
REINDEX#
-- Locks the table — use CONCURRENTLY in PostgreSQL 12+
REINDEX INDEX CONCURRENTLY idx_events_created_at;
-- Reindex every index on a table
REINDEX TABLE CONCURRENTLY events;
CONCURRENTLY builds a new index without holding an exclusive lock, then swaps it in.
pg_repack#
pg_repack is an extension that repacks tables and indexes online, without long locks:
# Repack a single table (removes bloat, rewrites table)
pg_repack -d mydb -t events
# Repack only indexes
pg_repack -d mydb -t events --only-indexes
Unlike VACUUM FULL, pg_repack holds only a brief exclusive lock at the swap step, making it safe for production use.
MySQL / InnoDB Maintenance#
OPTIMIZE TABLE#
-- Rebuilds the table and reclaims fragmented space
OPTIMIZE TABLE orders;
For InnoDB, this is equivalent to ALTER TABLE ... ENGINE=InnoDB — it recreates the table. In MySQL 5.7+ with innodb_file_per_table, this reclaims disk space.
ANALYZE TABLE#
ANALYZE TABLE orders;
Updates index cardinality statistics so the optimizer can pick better execution plans.
Monitoring Fragmentation#
SELECT table_name,
data_length,
data_free,
round(data_free * 100.0 / nullif(data_length + data_free, 0), 1) AS frag_pct
FROM information_schema.tables
WHERE table_schema = 'mydb'
ORDER BY data_free DESC;
A frag_pct above 15-20 % suggests the table would benefit from OPTIMIZE.
Maintenance Windows#
Even "online" operations consume I/O and CPU. Structure maintenance in tiers:
| Frequency | Operation | Impact |
|---|---|---|
| Continuous | Autovacuum (PostgreSQL) | Low |
| Daily | ANALYZE on high-churn tables | Low |
| Weekly | Manual VACUUM on large tables | Medium |
| Monthly | REINDEX CONCURRENTLY / pg_repack | Medium-High |
| Quarterly | VACUUM FULL (if truly needed) | High |
Scheduling with pg_cron#
-- Install pg_cron
CREATE EXTENSION pg_cron;
-- Weekly VACUUM ANALYZE on events table, Sunday 3 AM UTC
SELECT cron.schedule(
'vacuum-events',
'0 3 * * 0',
$$VACUUM ANALYZE events$$
);
Monitoring and Alerting#
Track these metrics continuously:
- Dead tuple ratio — Alert when any table exceeds 10 % dead tuples.
- Table and index size growth — Sudden jumps indicate bloat or missing vacuums.
- Autovacuum duration — Long-running autovacuums may need cost limit tuning.
- Transaction ID wraparound — PostgreSQL must vacuum before reaching 2 billion transactions to avoid data loss. Monitor
age(datfrozenxid).
-- Transaction ID age — alert if approaching 1 billion
SELECT datname,
age(datfrozenxid) AS xid_age
FROM pg_database
ORDER BY age(datfrozenxid) DESC;
Grafana Dashboard Essentials#
pg_stat_user_tables.n_dead_tupper tablepg_stat_activitycount of autovacuum workerspg_database_size()over time- Autovacuum last-run timestamps
Common Pitfalls#
- Long-running transactions — An open transaction prevents VACUUM from reclaiming tuples visible to that transaction. Monitor
idle in transactionsessions. - Disabling autovacuum — Never disable it globally. If a table needs special treatment, tune per-table settings instead.
- Running VACUUM FULL during peak traffic — It acquires an exclusive lock. Use pg_repack or schedule during low-traffic windows.
- Ignoring index bloat — Tables can be well-vacuumed while indexes remain bloated. Track index sizes independently.
- Forgetting ANALYZE after schema changes — Adding a column or changing data distribution requires fresh statistics.
Quick Reference Checklist#
- Autovacuum enabled and tuned per table for high-churn workloads
- Weekly REINDEX CONCURRENTLY on the largest indexes
- Monthly pg_repack on tables with persistent bloat
- Transaction ID wraparound alerts set below 500 million
idle in transactiontimeout configured (idle_in_transaction_session_timeout)- MySQL tables with over 15 % fragmentation scheduled for OPTIMIZE
That is article #351 on codelit.io — if you found it useful, explore the rest of the library for deep dives on databases, system design, and backend engineering.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
Try these templates
Build this architecture
Generate an interactive architecture for Database Vacuum & Maintenance in seconds.
Try it in Codelit →
Comments