Streaming ReplicationLogical ReplicationPgBouncerVACUUMAutovacuumPITRPatroniBackup

Replication, Pooling & Operations

The operational reality of running PostgreSQL in production — replication for HA, connection pooling as a requirement, VACUUM as the cost of MVCC, and backup strategies.

45 min read9 sections

Streaming Replication

Streaming replication sends WAL records from the primary to standbys in real-time. Standbys replay the WAL to maintain an identical copy of the database. This is physical replication — byte-for-byte identical.

Mode	Primary Waits For	Data Loss Risk	Latency Impact
Asynchronous	Nothing — sends WAL and continues	Possible (uncommitted WAL on crash)	None
Synchronous (remote_write)	Standby received WAL in memory	Very low	Slight
Synchronous (remote_apply)	Standby applied WAL (queryable)	Zero	Noticeable

📡

Radio Broadcast

Streaming replication is like a radio broadcast. The primary continuously broadcasts WAL (the signal). Standbys tune in and replay what they hear. Async mode: the DJ doesn't wait for listeners to confirm they heard the song. Sync mode: the DJ pauses until at least one listener confirms receipt. If the broadcast tower goes down, a listener can become the new broadcaster (failover).

Monitoring replicationsql

-- On primary: check replication status
SELECT
  client_addr,
  state,
  sent_lsn,
  write_lsn,
  flush_lsn,
  replay_lsn,
  pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;

-- On standby: check how far behind
SELECT
  now() - pg_last_xact_replay_timestamp() AS replay_lag;

Replication Slots — The Disk Bomb

Replication slots prevent the primary from deleting WAL segments that the standby hasn't consumed yet. If a standby goes offline for days, the primary accumulates WAL indefinitely — filling the disk. Always monitor slot lag and set max_slot_wal_keep_size to cap WAL retention.

Logical Replication

Logical replication decodes WAL into row-level changes (INSERT, UPDATE, DELETE) and sends them to subscribers. Unlike streaming replication, it's selective (specific tables), works across major versions, and can replicate to non-PostgreSQL targets.

Aspect	Streaming (Physical)	Logical
Granularity	Entire cluster (all databases)	Per-table selection
Cross-version	❌ Same major version required	✅ Different major versions OK
Cross-platform	❌ Same architecture required	✅ Any platform
DDL replication	✅ Automatic (byte-for-byte)	❌ Not replicated — manual
Sequences	✅ Replicated	❌ Not replicated
Use case	HA failover, read replicas	Migration, CDC, selective sync

Logical replication setupsql

-- On publisher (source):
CREATE PUBLICATION my_pub FOR TABLE orders, users;
-- Or replicate all tables:
-- CREATE PUBLICATION my_pub FOR ALL TABLES;

-- On subscriber (target):
CREATE SUBSCRIPTION my_sub
  CONNECTION 'host=primary dbname=mydb user=replicator'
  PUBLICATION my_pub;

-- Monitor subscription status
SELECT * FROM pg_stat_subscription;

Logical Replication Use Cases

✅Zero-downtime major version upgrades (replicate to new version, then switch)
✅Change Data Capture (CDC) — stream changes to Kafka, Elasticsearch, etc.
✅Selective replication — sync only specific tables to a reporting database
✅Multi-master patterns (with conflict resolution — use with caution)

Failover & HA (Patroni)

PostgreSQL doesn't have built-in automatic failover. You need an external tool to detect primary failure and promote a standby. Patroni is the industry standard — it uses a distributed consensus store (etcd, ZooKeeper, or Consul) to coordinate failover.

Tool	Consensus Store	Approach
Patroni	etcd / ZooKeeper / Consul	Industry standard — automatic failover, switchover, configuration management
Repmgr	None (uses PostgreSQL itself)	Simpler — manual or scripted failover
pg_auto_failover	Built-in monitor node	Simpler alternative to Patroni for small setups

Patroni Failover Flow

(1) Patroni agents on each node heartbeat to etcd. (2) If primary stops heartbeating, etcd leader lock expires. (3) Patroni on the most up-to-date standby acquires the leader lock. (4) That standby is promoted to primary (pg_promote). (5) Other standbys repoint to the new primary. (6) Applications reconnect via HAProxy or DNS update. Typical failover time: 10-30 seconds.

Manual promotion (without Patroni)sql

-- Promote a standby to primary
SELECT pg_promote();

-- Or from command line:
-- pg_ctl promote -D /var/lib/postgresql/data

-- After promotion, old primary cannot rejoin directly
-- Must be re-basebackuped as a new standby

Connection Pooling (PgBouncer)

Connection pooling is not optional in production PostgreSQL. Each connection is an OS process (~5-10 MB). Without pooling, a typical application with 20 pods × 10 threads = 200 connections burns 1-2 GB just on connection overhead — and performance degrades from context switching.

Pool Mode	Connection Returned	Supports	Breaks
Session	When client disconnects	Everything	Nothing (but least efficient)
Transaction	After each COMMIT/ROLLBACK	Most queries	SET, prepared statements, advisory locks, LISTEN/NOTIFY
Statement	After each statement	Simple queries only	Multi-statement transactions

Transaction Pooling — The Sweet Spot

Transaction pooling is the most common production mode. A server connection is assigned to a client only for the duration of a transaction, then returned to the pool. 1000 application threads can share 50 actual PostgreSQL connections. The trade-off: session-level features (SET, prepared statements, advisory locks) don't work because you get a different connection each transaction.

PgBouncer configurationini

[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt

pool_mode = transaction
default_pool_size = 50
max_client_conn = 5000
server_idle_timeout = 300

# Application connects to PgBouncer on port 6432
# PgBouncer maintains 50 actual connections to PostgreSQL on port 5432

VACUUM & Autovacuum

MVCC leaves dead tuples — old row versions no longer visible to any transaction. VACUUM reclaims this space. Without VACUUM, tables bloat indefinitely, queries slow down, and eventually transaction ID wraparound stops the database.

Operation	What It Does	Locks
VACUUM	Marks dead tuple space as reusable (does NOT return to OS)	No exclusive lock — runs concurrently
VACUUM FULL	Rewrites entire table — compacts and returns space to OS	ACCESS EXCLUSIVE — blocks everything
ANALYZE	Updates table statistics for the query planner	No exclusive lock
VACUUM ANALYZE	Both in one command — the common combination	No exclusive lock

Autovacuum tuningsql

-- Check autovacuum status per table
SELECT
  relname,
  n_live_tup,
  n_dead_tup,
  last_vacuum,
  last_autovacuum,
  autovacuum_count
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

-- Autovacuum triggers when:
-- dead_tuples > autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor * n_live_tup)
-- Default: 50 + (0.2 * n_live_tup) — 20% of table must be dead

-- For high-churn tables, lower the threshold:
ALTER TABLE hot_events SET (
  autovacuum_vacuum_scale_factor = 0.01,  -- trigger at 1% dead
  autovacuum_vacuum_threshold = 1000
);

-- Check if autovacuum is keeping up
SELECT relname, n_dead_tup,
  n_dead_tup::float / nullif(n_live_tup, 0) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY dead_ratio DESC;

VACUUM FULL — Use Sparingly

VACUUM FULL rewrites the entire table and requires an ACCESS EXCLUSIVE lock — blocking all reads and writes for the duration. On a 100 GB table, this could take hours. Use pg_repack extension instead — it does online table compaction without exclusive locks.

Transaction ID Wraparound

PostgreSQL uses 32-bit transaction IDs — approximately 2 billion before wraparound. If the counter wraps, old committed rows would appear to be "in the future" and become invisible. To prevent this catastrophe, PostgreSQL stops accepting writes when only 3 million transactions remain.

⏰

The Odometer

Transaction IDs are like a car's odometer that only goes to 2 billion. VACUUM 'freezes' old rows — marking them as 'definitely visible to everyone' so they no longer need a transaction ID comparison. If you never freeze old rows (VACUUM never runs), the odometer eventually rolls over and the car refuses to start. This is the PostgreSQL doomsday scenario — and it's entirely preventable with proper VACUUM monitoring.

Monitoring wraparound risksql

-- Check transaction ID age per database
-- WARNING threshold: age > 500 million
-- CRITICAL threshold: age > 1 billion
SELECT
  datname,
  age(datfrozenxid) AS xid_age,
  CASE
    WHEN age(datfrozenxid) > 1000000000 THEN '🔴 CRITICAL'
    WHEN age(datfrozenxid) > 500000000 THEN '🟡 WARNING'
    ELSE '🟢 OK'
  END AS status
FROM pg_database
ORDER BY xid_age DESC;

-- Check per-table (which table is blocking freeze?)
SELECT
  relname,
  age(relfrozenxid) AS xid_age
FROM pg_class
WHERE relkind = 'r'
ORDER BY xid_age DESC
LIMIT 10;

-- Anti-wraparound VACUUM is triggered automatically when age > autovacuum_freeze_max_age (default: 200M)
-- But if it can't complete (long transactions blocking it), age keeps growing

Prevention

Monitor age(datfrozenxid) in your alerting system. If it exceeds 500 million, investigate immediately. Common causes: long-running transactions preventing VACUUM from freezing rows, or autovacuum not keeping up on large tables. Never let it reach 2 billion — PostgreSQL will enter read-only mode to protect data integrity.

Backup & Point-in-Time Recovery

Method	Type	PITR	Speed	Use Case
pg_dump	Logical (SQL)	❌ No	Slow for large DBs	Small DBs, schema migration, selective restore
pg_basebackup + WAL archiving	Physical	✅ Yes	Fast backup, fast restore	Production — the standard approach
pgBackRest	Physical (incremental)	✅ Yes	Fastest (incremental, parallel)	Production at scale — recommended
Barman	Physical	✅ Yes	Fast	Alternative to pgBackRest

Point-in-Time Recovery (PITR)

PITR lets you restore to any point in time — not just the backup moment. It works by: (1) restoring a base backup, then (2) replaying WAL segments up to the target timestamp. This means you can recover from "someone dropped the production table at 3:47 PM" by restoring to 3:46 PM. Requires continuous WAL archiving.

Backup with pgBackRestbash

# Initial full backup
pgbackrest --stanza=main --type=full backup

# Daily incremental backup (only changed blocks)
pgbackrest --stanza=main --type=incr backup

# Restore to a specific point in time
pgbackrest --stanza=main --type=time \
  --target="2024-03-15 15:46:00+00" \
  --target-action=promote \
  restore

# List available backups
pgbackrest --stanza=main info

RTO & RPO

Metric	Definition	How to Minimize
RPO (Recovery Point Objective)	Maximum acceptable data loss	Synchronous replication (RPO = 0) or frequent WAL archiving
RTO (Recovery Time Objective)	Maximum acceptable downtime	Hot standby + automatic failover (Patroni) — RTO ~30 seconds

Interview Questions

Q:What's the difference between streaming and logical replication?

A: Streaming replication sends raw WAL bytes — the standby is a byte-for-byte copy of the primary. It replicates everything (all databases, all tables, DDL). Logical replication decodes WAL into row-level changes and sends them selectively (specific tables). Logical works across major versions and to non-PostgreSQL targets, but doesn't replicate DDL or sequences.

Q:Why is connection pooling mandatory for PostgreSQL?

A: Each PostgreSQL connection is a full OS process (~5-10 MB RAM). At 500 connections, that's 2.5-5 GB just for process overhead. Performance degrades from context switching. PgBouncer in transaction pooling mode lets thousands of application threads share a small pool (20-50) of actual database connections. The connections are assigned per-transaction and returned to the pool immediately after COMMIT.

Q:What is transaction ID wraparound and how do you prevent it?

A: PostgreSQL uses 32-bit transaction IDs (~2 billion). VACUUM freezes old rows so they no longer need transaction ID comparison. If VACUUM can't keep up, the transaction ID counter approaches wraparound — PostgreSQL stops accepting writes at 3M remaining to prevent data corruption. Prevention: monitor age(datfrozenxid), ensure autovacuum runs successfully, avoid long-running transactions that block VACUUM.

Q:How does Point-in-Time Recovery work?

A: PITR requires: (1) periodic base backups (pg_basebackup or pgBackRest), and (2) continuous WAL archiving (every completed WAL segment copied to safe storage). To recover: restore the most recent base backup before the target time, then replay archived WAL segments up to the exact target timestamp. This lets you recover to any second, not just backup times.

Q:What's the difference between VACUUM and VACUUM FULL?

A: VACUUM marks dead tuple space as reusable within the table file — it doesn't shrink the file or return space to the OS. It runs concurrently (no exclusive lock). VACUUM FULL rewrites the entire table into a new file, compacting it and returning space to the OS — but requires an ACCESS EXCLUSIVE lock (blocks all operations). For online compaction without locks, use pg_repack extension.

Common Mistakes

⚠️

No connection pooling in production

Connecting directly to PostgreSQL from application pods — each pod opens 10-20 connections, 50 pods = 500-1000 connections.

✅Deploy PgBouncer between your application and PostgreSQL. Use transaction pooling mode with a pool size of 20-50 actual connections. Application connects to PgBouncer on a different port.

⚠️

Not monitoring replication slot lag

Creating a replication slot for a standby, then the standby goes offline for days — WAL accumulates on primary until disk is full.

✅Monitor pg_replication_slots for inactive slots with growing lag. Set max_slot_wal_keep_size to cap WAL retention. Alert when slot lag exceeds a threshold. Drop slots for permanently offline standbys.

⚠️

Running VACUUM FULL on large tables in production

Table is bloated, so you run VACUUM FULL — which takes an ACCESS EXCLUSIVE lock for hours, blocking all queries.

✅Use pg_repack extension for online table compaction without exclusive locks. Or accept that regular VACUUM (which runs concurrently) will eventually reuse the space for new rows — the file won't shrink, but new inserts will fill the gaps.

⚠️

Only testing backups, never testing restores

Running pg_dump nightly and assuming backups work — never actually restoring to verify.

✅Regularly test full restore procedures: restore from backup, verify data integrity, measure restore time (RTO). A backup you can't restore from is not a backup. Automate restore testing in a staging environment.