database vacuumpostgresql maintenanceautovacuum tuningbloat detectionpg_repackmysql optimize tabledatabase maintenancesystem design

Database Vacuum & Maintenance: Keeping PostgreSQL and MySQL Healthy

March 29, 2026 6 min readBy Codelit Team Discussion

Databases do not stay fast on their own. Every UPDATE and DELETE in PostgreSQL leaves behind dead tuples. Every variable-length row change in MySQL/InnoDB can fragment pages. Without regular maintenance the query planner works with stale statistics, indexes bloat, and storage grows far beyond the live data set. This guide covers the maintenance primitives you need and how to automate them.

Why Dead Tuples Accumulate in PostgreSQL#

PostgreSQL uses Multi-Version Concurrency Control (MVCC). When you update a row, the old version is not removed immediately — it stays on disk so that concurrent transactions can still see it. Once no transaction needs the old version, it becomes a dead tuple.

Dead tuples cause two problems:

Table bloat — The table file grows because dead tuples occupy space that cannot be reused until vacuumed.
Index bloat — Every index entry still points to the dead tuple, increasing index scan time and memory usage.

VACUUM Fundamentals#

PostgreSQL provides two vacuum modes:

-- Standard VACUUM: reclaims dead tuple space for reuse within the table
VACUUM my_table;

-- VACUUM FULL: rewrites the entire table to reclaim space to the OS
-- Acquires an ACCESS EXCLUSIVE lock — avoid in production
VACUUM FULL my_table;

Standard VACUUM marks dead tuple space as reusable but does not shrink the file on disk. VACUUM FULL does shrink it, but locks the table for the entire duration.

VACUUM ANALYZE#

Combine vacuuming with statistics collection:

VACUUM ANALYZE my_table;

Fresh statistics let the query planner choose better join strategies and index scans. Always run ANALYZE after bulk loads or large deletes.

Autovacuum: The Background Worker#

PostgreSQL ships with autovacuum enabled by default. It launches worker processes that vacuum and analyze tables automatically based on thresholds.

Key Parameters#

autovacuum_vacuum_threshold      = 50     -- minimum dead tuples before vacuum
autovacuum_vacuum_scale_factor   = 0.2    -- fraction of live tuples added to threshold
autovacuum_analyze_threshold     = 50
autovacuum_analyze_scale_factor  = 0.1
autovacuum_naptime               = 1min   -- how often the launcher checks tables
autovacuum_max_workers           = 3

A table is vacuumed when:

dead_tuples > threshold + (scale_factor * live_tuples)

For a 10-million-row table with defaults, that means autovacuum fires after 2,000,050 dead tuples — far too late for write-heavy workloads.

Tuning for Large Tables#

Set per-table overrides for hot tables:

ALTER TABLE events SET (
  autovacuum_vacuum_scale_factor = 0.01,
  autovacuum_vacuum_threshold = 1000,
  autovacuum_vacuum_cost_delay = 2
);

Reducing cost_delay lets the worker consume more I/O budget, finishing faster at the cost of slightly higher background load.

Detecting Bloat#

pg_stat_user_tables#

SELECT relname,
       n_dead_tup,
       n_live_tup,
       round(n_dead_tup * 100.0 / nullif(n_live_tup, 0), 1) AS dead_pct,
       last_autovacuum
FROM   pg_stat_user_tables
ORDER  BY n_dead_tup DESC
LIMIT  10;

pgstattuple Extension#

CREATE EXTENSION IF NOT EXISTS pgstattuple;

SELECT * FROM pgstattuple('my_table');
-- Returns dead_tuple_count, dead_tuple_len, free_space, table_len

A dead_tuple_len / table_len ratio above 20 % signals significant bloat.

Index Bloat#

SELECT indexrelname,
       pg_relation_size(indexrelid) AS index_size,
       idx_scan
FROM   pg_stat_user_indexes
WHERE  schemaname = 'public'
ORDER  BY pg_relation_size(indexrelid) DESC;

Large indexes with low scan counts are candidates for a REINDEX or drop.

REINDEX and pg_repack#

REINDEX#

-- Locks the table — use CONCURRENTLY in PostgreSQL 12+
REINDEX INDEX CONCURRENTLY idx_events_created_at;

-- Reindex every index on a table
REINDEX TABLE CONCURRENTLY events;

CONCURRENTLY builds a new index without holding an exclusive lock, then swaps it in.

pg_repack#

pg_repack is an extension that repacks tables and indexes online, without long locks:

# Repack a single table (removes bloat, rewrites table)
pg_repack -d mydb -t events

# Repack only indexes
pg_repack -d mydb -t events --only-indexes

Unlike VACUUM FULL, pg_repack holds only a brief exclusive lock at the swap step, making it safe for production use.

MySQL / InnoDB Maintenance#

OPTIMIZE TABLE#

-- Rebuilds the table and reclaims fragmented space
OPTIMIZE TABLE orders;

For InnoDB, this is equivalent to ALTER TABLE ... ENGINE=InnoDB — it recreates the table. In MySQL 5.7+ with innodb_file_per_table, this reclaims disk space.

ANALYZE TABLE#

ANALYZE TABLE orders;

Updates index cardinality statistics so the optimizer can pick better execution plans.

Monitoring Fragmentation#

SELECT table_name,
       data_length,
       data_free,
       round(data_free * 100.0 / nullif(data_length + data_free, 0), 1) AS frag_pct
FROM   information_schema.tables
WHERE  table_schema = 'mydb'
ORDER  BY data_free DESC;

A frag_pct above 15-20 % suggests the table would benefit from OPTIMIZE.

Maintenance Windows#

Even "online" operations consume I/O and CPU. Structure maintenance in tiers:

Frequency	Operation	Impact
Continuous	Autovacuum (PostgreSQL)	Low
Daily	ANALYZE on high-churn tables	Low
Weekly	Manual VACUUM on large tables	Medium
Monthly	REINDEX CONCURRENTLY / pg_repack	Medium-High
Quarterly	VACUUM FULL (if truly needed)	High

Scheduling with pg_cron#

-- Install pg_cron
CREATE EXTENSION pg_cron;

-- Weekly VACUUM ANALYZE on events table, Sunday 3 AM UTC
SELECT cron.schedule(
  'vacuum-events',
  '0 3 * * 0',
  $$VACUUM ANALYZE events$$
);

Monitoring and Alerting#

Track these metrics continuously:

Dead tuple ratio — Alert when any table exceeds 10 % dead tuples.
Table and index size growth — Sudden jumps indicate bloat or missing vacuums.
Autovacuum duration — Long-running autovacuums may need cost limit tuning.
Transaction ID wraparound — PostgreSQL must vacuum before reaching 2 billion transactions to avoid data loss. Monitor age(datfrozenxid).

-- Transaction ID age — alert if approaching 1 billion
SELECT datname,
       age(datfrozenxid) AS xid_age
FROM   pg_database
ORDER  BY age(datfrozenxid) DESC;

Grafana Dashboard Essentials#

pg_stat_user_tables.n_dead_tup per table
pg_stat_activity count of autovacuum workers
pg_database_size() over time
Autovacuum last-run timestamps

Common Pitfalls#

Long-running transactions — An open transaction prevents VACUUM from reclaiming tuples visible to that transaction. Monitor idle in transaction sessions.
Disabling autovacuum — Never disable it globally. If a table needs special treatment, tune per-table settings instead.
Running VACUUM FULL during peak traffic — It acquires an exclusive lock. Use pg_repack or schedule during low-traffic windows.
Ignoring index bloat — Tables can be well-vacuumed while indexes remain bloated. Track index sizes independently.
Forgetting ANALYZE after schema changes — Adding a column or changing data distribution requires fresh statistics.

Quick Reference Checklist#

Autovacuum enabled and tuned per table for high-churn workloads
Weekly REINDEX CONCURRENTLY on the largest indexes
Monthly pg_repack on tables with persistent bloat
Transaction ID wraparound alerts set below 500 million
idle in transaction timeout configured (idle_in_transaction_session_timeout)
MySQL tables with over 15 % fragmentation scheduled for OPTIMIZE

That is article #351 on codelit.io — if you found it useful, explore the rest of the library for deep dives on databases, system design, and backend engineering.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

PostgreSQL High Availability Cluster

Production PostgreSQL with streaming replication, connection pooling, automated failover, and monitoring.

10 components

Build this architecture

Generate an interactive architecture for Database Vacuum & Maintenance in seconds.

Try it in Codelit →

database vacuumpostgresql maintenanceautovacuum tuningbloat detectionpg_repackmysql optimize tabledatabase maintenancesystem design

Database Vacuum & Maintenance: Keeping PostgreSQL and MySQL Healthy

March 29, 2026 6 min readBy Codelit Team Discussion

Why Dead Tuples Accumulate in PostgreSQL#

Dead tuples cause two problems:

Table bloat — The table file grows because dead tuples occupy space that cannot be reused until vacuumed.
Index bloat — Every index entry still points to the dead tuple, increasing index scan time and memory usage.

VACUUM Fundamentals#

PostgreSQL provides two vacuum modes:

-- Standard VACUUM: reclaims dead tuple space for reuse within the table
VACUUM my_table;

-- VACUUM FULL: rewrites the entire table to reclaim space to the OS
-- Acquires an ACCESS EXCLUSIVE lock — avoid in production
VACUUM FULL my_table;

Standard VACUUM marks dead tuple space as reusable but does not shrink the file on disk. VACUUM FULL does shrink it, but locks the table for the entire duration.

VACUUM ANALYZE#

Combine vacuuming with statistics collection:

VACUUM ANALYZE my_table;

Fresh statistics let the query planner choose better join strategies and index scans. Always run ANALYZE after bulk loads or large deletes.

Autovacuum: The Background Worker#

PostgreSQL ships with autovacuum enabled by default. It launches worker processes that vacuum and analyze tables automatically based on thresholds.

Key Parameters#

autovacuum_vacuum_threshold      = 50     -- minimum dead tuples before vacuum
autovacuum_vacuum_scale_factor   = 0.2    -- fraction of live tuples added to threshold
autovacuum_analyze_threshold     = 50
autovacuum_analyze_scale_factor  = 0.1
autovacuum_naptime               = 1min   -- how often the launcher checks tables
autovacuum_max_workers           = 3

A table is vacuumed when:

dead_tuples > threshold + (scale_factor * live_tuples)

For a 10-million-row table with defaults, that means autovacuum fires after 2,000,050 dead tuples — far too late for write-heavy workloads.

Tuning for Large Tables#

Set per-table overrides for hot tables:

ALTER TABLE events SET (
  autovacuum_vacuum_scale_factor = 0.01,
  autovacuum_vacuum_threshold = 1000,
  autovacuum_vacuum_cost_delay = 2
);

Reducing cost_delay lets the worker consume more I/O budget, finishing faster at the cost of slightly higher background load.

Detecting Bloat#

pg_stat_user_tables#

SELECT relname,
       n_dead_tup,
       n_live_tup,
       round(n_dead_tup * 100.0 / nullif(n_live_tup, 0), 1) AS dead_pct,
       last_autovacuum
FROM   pg_stat_user_tables
ORDER  BY n_dead_tup DESC
LIMIT  10;

pgstattuple Extension#

CREATE EXTENSION IF NOT EXISTS pgstattuple;

SELECT * FROM pgstattuple('my_table');
-- Returns dead_tuple_count, dead_tuple_len, free_space, table_len

A dead_tuple_len / table_len ratio above 20 % signals significant bloat.

Index Bloat#

SELECT indexrelname,
       pg_relation_size(indexrelid) AS index_size,
       idx_scan
FROM   pg_stat_user_indexes
WHERE  schemaname = 'public'
ORDER  BY pg_relation_size(indexrelid) DESC;

Large indexes with low scan counts are candidates for a REINDEX or drop.

REINDEX and pg_repack#

REINDEX#

-- Locks the table — use CONCURRENTLY in PostgreSQL 12+
REINDEX INDEX CONCURRENTLY idx_events_created_at;

-- Reindex every index on a table
REINDEX TABLE CONCURRENTLY events;

CONCURRENTLY builds a new index without holding an exclusive lock, then swaps it in.

pg_repack#

pg_repack is an extension that repacks tables and indexes online, without long locks:

# Repack a single table (removes bloat, rewrites table)
pg_repack -d mydb -t events

# Repack only indexes
pg_repack -d mydb -t events --only-indexes

Unlike VACUUM FULL, pg_repack holds only a brief exclusive lock at the swap step, making it safe for production use.

MySQL / InnoDB Maintenance#

OPTIMIZE TABLE#

-- Rebuilds the table and reclaims fragmented space
OPTIMIZE TABLE orders;

For InnoDB, this is equivalent to ALTER TABLE ... ENGINE=InnoDB — it recreates the table. In MySQL 5.7+ with innodb_file_per_table, this reclaims disk space.

ANALYZE TABLE#

ANALYZE TABLE orders;

Updates index cardinality statistics so the optimizer can pick better execution plans.

Monitoring Fragmentation#

SELECT table_name,
       data_length,
       data_free,
       round(data_free * 100.0 / nullif(data_length + data_free, 0), 1) AS frag_pct
FROM   information_schema.tables
WHERE  table_schema = 'mydb'
ORDER  BY data_free DESC;

A frag_pct above 15-20 % suggests the table would benefit from OPTIMIZE.

Maintenance Windows#

Even "online" operations consume I/O and CPU. Structure maintenance in tiers:

Frequency	Operation	Impact
Continuous	Autovacuum (PostgreSQL)	Low
Daily	ANALYZE on high-churn tables	Low
Weekly	Manual VACUUM on large tables	Medium
Monthly	REINDEX CONCURRENTLY / pg_repack	Medium-High
Quarterly	VACUUM FULL (if truly needed)	High

Scheduling with pg_cron#

-- Install pg_cron
CREATE EXTENSION pg_cron;

-- Weekly VACUUM ANALYZE on events table, Sunday 3 AM UTC
SELECT cron.schedule(
  'vacuum-events',
  '0 3 * * 0',
  $$VACUUM ANALYZE events$$
);

Monitoring and Alerting#

Track these metrics continuously:

Dead tuple ratio — Alert when any table exceeds 10 % dead tuples.
Table and index size growth — Sudden jumps indicate bloat or missing vacuums.
Autovacuum duration — Long-running autovacuums may need cost limit tuning.
Transaction ID wraparound — PostgreSQL must vacuum before reaching 2 billion transactions to avoid data loss. Monitor age(datfrozenxid).

-- Transaction ID age — alert if approaching 1 billion
SELECT datname,
       age(datfrozenxid) AS xid_age
FROM   pg_database
ORDER  BY age(datfrozenxid) DESC;

Grafana Dashboard Essentials#

pg_stat_user_tables.n_dead_tup per table
pg_stat_activity count of autovacuum workers
pg_database_size() over time
Autovacuum last-run timestamps

Common Pitfalls#

Long-running transactions — An open transaction prevents VACUUM from reclaiming tuples visible to that transaction. Monitor idle in transaction sessions.
Disabling autovacuum — Never disable it globally. If a table needs special treatment, tune per-table settings instead.
Running VACUUM FULL during peak traffic — It acquires an exclusive lock. Use pg_repack or schedule during low-traffic windows.
Ignoring index bloat — Tables can be well-vacuumed while indexes remain bloated. Track index sizes independently.
Forgetting ANALYZE after schema changes — Adding a column or changing data distribution requires fresh statistics.

Quick Reference Checklist#

Autovacuum enabled and tuned per table for high-churn workloads
Weekly REINDEX CONCURRENTLY on the largest indexes
Monthly pg_repack on tables with persistent bloat
Transaction ID wraparound alerts set below 500 million
idle in transaction timeout configured (idle_in_transaction_session_timeout)
MySQL tables with over 15 % fragmentation scheduled for OPTIMIZE

That is article #351 on codelit.io — if you found it useful, explore the rest of the library for deep dives on databases, system design, and backend engineering.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

Try these templates

PostgreSQL High Availability Cluster

Production PostgreSQL with streaming replication, connection pooling, automated failover, and monitoring.

10 components

Build this architecture

Generate an interactive architecture for Database Vacuum & Maintenance in seconds.

Try it in Codelit →

Database Vacuum & Maintenance: Keeping PostgreSQL and MySQL Healthy

Why Dead Tuples Accumulate in PostgreSQL#

VACUUM Fundamentals#

VACUUM ANALYZE#

Autovacuum: The Background Worker#

Key Parameters#

Tuning for Large Tables#

Detecting Bloat#

pg_stat_user_tables#

pgstattuple Extension#

Index Bloat#

REINDEX and pg_repack#

REINDEX#

pg_repack#

MySQL / InnoDB Maintenance#

OPTIMIZE TABLE#

ANALYZE TABLE#

Monitoring Fragmentation#

Maintenance Windows#

Scheduling with pg_cron#

Monitoring and Alerting#

Grafana Dashboard Essentials#

Common Pitfalls#

Quick Reference Checklist#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

PostgreSQL High Availability Cluster

Build this architecture

Database Vacuum & Maintenance: Keeping PostgreSQL and MySQL Healthy

Why Dead Tuples Accumulate in PostgreSQL#

VACUUM Fundamentals#

VACUUM ANALYZE#

Autovacuum: The Background Worker#

Key Parameters#

Tuning for Large Tables#

Detecting Bloat#

pg_stat_user_tables#

pgstattuple Extension#

Index Bloat#

REINDEX and pg_repack#

REINDEX#

pg_repack#

MySQL / InnoDB Maintenance#

OPTIMIZE TABLE#

ANALYZE TABLE#

Monitoring Fragmentation#

Maintenance Windows#

Scheduling with pg_cron#

Monitoring and Alerting#

Grafana Dashboard Essentials#

Common Pitfalls#

Quick Reference Checklist#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

PostgreSQL High Availability Cluster

Build this architecture