DBA Stuff PostgreSQL

Vacuum, Autovacuum, and Analyze — Everything a PostgreSQL DBA Must Know

Vacuum, Autovacuum, and Analyze — Everything a PostgreSQL DBA Must Know


Introduction

PostgreSQL uses MVCC (Multi-Version Concurrency Control) to handle concurrent transactions.

  • When rows are updated or deleted, PostgreSQL doesn’t overwrite them.

  • Instead, it marks the old versions as dead tuples and keeps new versions.

  • Over time, dead tuples accumulate → bloat, wasted space, slower queries.

This is where VACUUM, AUTOVACUUM, and ANALYZE come in.
They are essential DBA tools to keep PostgreSQL lean, performant, and healthy.


Why Do We Need VACUUM?

  • Dead tuples waste disk space.

  • Queries scanning bloated tables run slower.

  • Without cleanup, PostgreSQL risks transaction ID wraparound, which can lead to downtime.


Types of VACUUM

1. VACUUM

  • Removes dead tuples.

  • Space is reusable inside the table (not returned to OS).

VACUUM my_table;

2. VACUUM FULL

  • Rewrites table → physically shrinks it.

  • Space is returned to OS.

  • Requires exclusive lock.

VACUUM FULL my_table;

3. VACUUM FREEZE

  • Marks tuples as “frozen” → prevents transaction ID wraparound issues.

VACUUM FREEZE my_table;

What is AUTOVACUUM?

Autovacuum = PostgreSQL’s automatic background cleaner.

  • Runs automatically based on thresholds.

  • Prevents table bloat and transaction wraparound.

Key settings in postgresql.conf:

autovacuum = on
autovacuum_naptime = 60 # seconds between checks
autovacuum_vacuum_threshold = 50 # min dead tuples before vacuum
autovacuum_vacuum_scale_factor = 0.2 # % of table that must be dead

Example: For a 1,000,000 row table with scale factor 0.2:

  • Autovacuum triggers at 200,000 dead tuples.


What is ANALYZE?

ANALYZE updates statistics used by PostgreSQL’s query planner.

  • Without up-to-date stats → bad query plans → poor performance.

Examples:

ANALYZE my_table; -- all columns
ANALYZE my_table (column1, column2); -- specific columns

Always run ANALYZE after bulk loads, deletes, or schema changes.


Step-by-Step Demo with Outputs

1. Create Test Table

CREATE TABLE vacuum_demo (
id SERIAL PRIMARY KEY,
name TEXT,
created_at TIMESTAMP DEFAULT now()
);

2. Insert 100,000 Rows

INSERT INTO vacuum_demo (name)
SELECT 'User_' || g
FROM generate_series(1, 100000) g;

✅ Row count:

SELECT count(*) FROM vacuum_demo;
-- 100000

Table size:

SELECT pg_size_pretty(pg_total_relation_size('vacuum_demo'));
-- 16 MB

3. Delete 50,000 Rows

DELETE FROM vacuum_demo WHERE id <= 50000;

Check dead tuples:

SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
WHERE relname = 'vacuum_demo';

Example result:

Table Live Rows Dead Rows
vacuum_demo 50,000 50,000

4. Run Plain VACUUM

VACUUM vacuum_demo;

Re-check stats:

SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
WHERE relname = 'vacuum_demo';

Example result:

Table Live Rows Dead Rows
vacuum_demo 50,000 0

Dead tuples are gone.
Table size still:

SELECT pg_size_pretty(pg_total_relation_size('vacuum_demo'));
-- 16 MB

5. Run VACUUM FULL

VACUUM FULL vacuum_demo;

Check table size:

SELECT pg_size_pretty(pg_total_relation_size('vacuum_demo'));
-- 8 MB

✅ Table shrank from 16 MB → 8 MB.


6. Run ANALYZE

ANALYZE vacuum_demo;

Check stats:

SELECT attname, n_distinct, most_common_vals
FROM pg_stats
WHERE tablename = 'vacuum_demo';

Query planner now has fresh statistics for optimal execution plans.


7. Watch Autovacuum in Action

Insert & delete again:

INSERT INTO vacuum_demo (name)
SELECT 'User_New_' || g
FROM generate_series(1, 300000) g;

DELETE FROM vacuum_demo WHERE id % 2 = 0;

Check autovacuum process:

SELECT datname, relname, state, query
FROM pg_stat_activity
WHERE query LIKE 'autovacuum%';

Check dead tuples trend over time:

SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
WHERE relname = 'vacuum_demo';

After some minutes, autovacuum will reduce dead tuples automatically.


Visual Bloat Check

Using pgstattuple

CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT * FROM pgstattuple('vacuum_demo');

Example before cleanup (after delete):

table_len | 16 MB
tuple_count | 100000
dead_tuple_count | 50000
dead_tuple_percent | 49.8
free_space | 1 MB

After VACUUM:

table_len | 16 MB
tuple_count | 50000
dead_tuple_count | 0
dead_tuple_percent | 0.0
free_space | 8 MB

After VACUUM FULL:

table_len | 8 MB
tuple_count | 50000
dead_tuple_count | 0
dead_tuple_percent | 0.0
free_space | 0.1 MB

Quick Approximate Bloat Query

WITH table_stats AS (
SELECT
relname AS table_name,
n_dead_tup,
n_live_tup,
ROUND(100 * n_dead_tup::numeric / (n_live_tup + 1), 2) AS dead_pct
FROM pg_stat_user_tables
)
SELECT * FROM table_stats
ORDER BY dead_pct DESC;

Fast way to spot bloated tables.


DBA Best Practices

  • ✅ Keep autovacuum enabled (don’t disable unless you have a custom scheduler).

  • ✅ Tune autovacuum scale factors for high-update tables.

  • ✅ Monitor bloat regularly with pgstattuple or catalog queries.

  • ✅ Run VACUUM FULL only during maintenance windows.

  • ✅ Always ANALYZE after bulk inserts or deletes.

  • ✅ Watch for transaction ID wraparound — PostgreSQL will stop writes if ignored.


Final Thoughts

  • VACUUM keeps your tables clean.

  • AUTOVACUUM keeps your database running automatically.

  • ANALYZE keeps your queries fast and efficient.

Together, these tools are not optional — they are mission-critical maintenance tasks every PostgreSQL DBA must master.

By monitoring, tuning, and scheduling these correctly, your PostgreSQL database will stay lean, performant, and production-ready .

Technical Editor

About Author

SQL Server DBA, Trainer, Blogger, Author

Leave a comment

You may also like

DBA Stuff Performance Tuning Scripts

How to capture SQL Server blocking information

Many a times Application team report SQL Server Blocking issues. In this article I will explain about Blocking and Its
DBA Stuff Performance Tuning Scripts SQL Server

SQL Server Query Store: A Complete Guide for DBAs

Enabling and Using Query Store in SQL Server As a SQL Server DBA, the Query Store is one of the