jsonbArraysRangesUUIDEnumsConstraintsNormalizationPartitioning

Data Types & Schema Design

PostgreSQL's type system is one of its most underused strengths. Master jsonb, arrays, ranges, and the schema design decisions that are expensive to change after launch.

40 min read9 sections

Numeric & Character Types

Choosing the right numeric type matters for storage, performance, and correctness. The most common mistake is using numeric wheninteger suffices, or double precision for money.

Type	Size	Range / Use Case
smallint	2 bytes	-32,768 to 32,767 — status codes, small counters
integer	4 bytes	-2.1B to 2.1B — the default choice for most IDs and counts
bigint	8 bytes	-9.2×10¹⁸ to 9.2×10¹⁸ — large tables, snowflake IDs
numeric(p,s)	variable	Exact decimal — money, financial calculations
real	4 bytes	6 decimal digits — scientific data where inexactness is OK
double precision	8 bytes	15 decimal digits — never use for money
serial / bigserial	4 / 8 bytes	Auto-increment shorthand (creates a sequence)

Never Use float for Money

0.1 + 0.2 = 0.30000000000000004 in floating point. For any financial calculation, use numeric(precision, scale)or store amounts as integers in the smallest unit (cents, paise).

Character Types

Type	Behavior	Recommendation
text	Variable length, no limit	✅ The default choice — use this
varchar(n)	Variable length with limit	Rarely better than text + CHECK
char(n)	Fixed length, space-padded	❌ Almost never the right choice

In PostgreSQL, text and varchar have identical performance. The length check in varchar(n) adds a constraint but no storage benefit. Prefer text with a CHECKconstraint if you need length validation — it's easier to modify later.

Date, Time & Boolean

Type	Stores	Use When
date	Date only (no time)	Birthdays, calendar dates
time	Time only (no date)	Rarely useful alone
timestamp	Date + time, NO timezone	❌ Avoid — ambiguous
timestamptz	Date + time, stored as UTC	✅ Always use this
interval	Duration: '3 hours', '2 days'	Date arithmetic, scheduling

Always Use timestamptz

timestamp (without timezone) stores the value as-is — if your server moves to a different timezone, your data means something different.timestamptz converts to UTC on storage and back to the session's timezone on retrieval. It's always unambiguous. There is no valid reason to use timestamp without timezone in a new system.

Timezone handlingsql

-- timestamptz stores as UTC, displays in session timezone
SET timezone = 'America/New_York';
SELECT now();  -- shows Eastern time

SET timezone = 'Asia/Kolkata';
SELECT now();  -- same instant, displayed as IST

-- Date arithmetic with intervals
SELECT now() + interval '30 days' AS expires_at;
SELECT age(now(), created_at) FROM users;  -- human-readable duration

JSON & jsonb

PostgreSQL has two JSON types. json stores text as-is and re-parses on every access. jsonb stores decomposed binary — supports indexing, is faster to query, and is almost always the correct choice.

Aspect	json	jsonb
Storage	Text (preserves whitespace, key order)	Binary (decomposed, no whitespace)
Parse cost	Re-parsed on every access	Parsed once on write
Indexing	❌ Cannot be indexed	✅ GIN index for containment queries
Operators	-> and ->> only	Full set: @>, <@, ?, ?\|, ?&, jsonb_path_query
Duplicate keys	Preserved	Last value wins
When to use	Audit logs where exact input matters	Everything else

jsonb operators and indexingsql

-- Create table with jsonb
CREATE TABLE events (
  id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  payload jsonb NOT NULL,
  created_at timestamptz DEFAULT now()
);

-- GIN index for containment queries
CREATE INDEX idx_events_payload ON events USING GIN (payload);

-- Query: find events where payload contains {"type": "purchase"}
SELECT * FROM events WHERE payload @> '{"type": "purchase"}';

-- Extract nested value
SELECT payload -> 'user' ->> 'email' AS email FROM events;

-- jsonb_path_query (SQL/JSON path — PG 12+)
SELECT * FROM events
WHERE jsonb_path_exists(payload, '$.items[*] ? (@.price > 100)');

jsonb vs Proper Columns

Use jsonb for truly schema-less data — event payloads, user preferences, third-party webhook bodies. If you find yourself querying the same jsonb path in every query, that field should be a proper column with a proper index. jsonb is an escape hatch, not a replacement for schema design.

Arrays & Range Types

Arrays

Any PostgreSQL type can be an array: integer[], text[],jsonb[]. Arrays are useful for tags, permissions, and small collections that don't need their own table.

Array operationssql

-- Table with array column
CREATE TABLE articles (
  id serial PRIMARY KEY,
  title text NOT NULL,
  tags text[] DEFAULT '{}'
);

-- Insert with array
INSERT INTO articles (title, tags)
VALUES ('MVCC Explained', ARRAY['postgresql', 'internals', 'mvcc']);

-- Query: articles with 'postgresql' tag
SELECT * FROM articles WHERE 'postgresql' = ANY(tags);

-- Query: articles with ALL of these tags
SELECT * FROM articles WHERE tags @> ARRAY['postgresql', 'internals'];

-- GIN index for fast array queries
CREATE INDEX idx_articles_tags ON articles USING GIN (tags);

Range Types

Range types represent a range of values: int4range,tstzrange, daterange. The killer feature is exclusion constraints — preventing overlapping bookings at the database level.

Range types and exclusion constraintssql

-- Prevent overlapping room bookings
CREATE TABLE bookings (
  id serial PRIMARY KEY,
  room_id integer NOT NULL,
  during tstzrange NOT NULL,
  EXCLUDE USING GIST (room_id WITH =, during WITH &&)
);

-- This succeeds
INSERT INTO bookings (room_id, during)
VALUES (1, tstzrange('2024-03-01 10:00', '2024-03-01 12:00'));

-- This fails — overlaps with existing booking
INSERT INTO bookings (room_id, during)
VALUES (1, tstzrange('2024-03-01 11:00', '2024-03-01 13:00'));
-- ERROR: conflicting key value violates exclusion constraint

Arrays vs Child Tables

Use arrays for small, fixed-size collections that you query as a whole (tags, permissions). Use a child table when: the collection is large, you need to query/update individual elements, or you need foreign keys on the elements. If you're doing unnest() in most queries, it should probably be a table.

UUID, Enum & Domain Types

UUID

Aspect	bigserial	UUID v4	UUID v7
Size	8 bytes	16 bytes	16 bytes
Generation	Sequential (database)	Random	Time-ordered + random
Index locality	✅ Excellent (sequential)	❌ Poor (random scatter)	✅ Good (time-ordered)
Globally unique	❌ No (per-table only)	✅ Yes	✅ Yes
Guessable	Yes (sequential)	No	Partially (time component)
Best for	Internal IDs, high-write tables	External-facing IDs	External IDs with good index perf

UUID usagesql

-- Built-in UUID generation (PG 13+)
CREATE TABLE users (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  email text UNIQUE NOT NULL
);

-- For UUIDv7 (time-ordered), use pg_uuidv7 extension
-- or generate in application layer

Enum Types

Enumssql

-- Create enum type
CREATE TYPE order_status AS ENUM (
  'pending', 'confirmed', 'shipped', 'delivered', 'cancelled'
);

-- Use in table
CREATE TABLE orders (
  id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  status order_status NOT NULL DEFAULT 'pending'
);

-- Enums are sorted by definition order, stored as 4 bytes
-- Adding values: only append (cannot reorder or remove)
ALTER TYPE order_status ADD VALUE 'refunded' AFTER 'cancelled';

Enum Trade-offs

Enums are compact (4 bytes) and type-safe, but inflexible. You can only append values — never remove or reorder. For frequently changing sets, prefer a lookup table with a foreign key, or a text column with a CHECK constraint (easier to modify with ALTER TABLE).

Domain Types

Domains — reusable constrained typessql

-- Email domain with validation
CREATE DOMAIN email_address AS text
  CHECK (value ~ '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$');

-- Use like any type
CREATE TABLE contacts (
  id serial PRIMARY KEY,
  email email_address NOT NULL
);

-- Positive money domain
CREATE DOMAIN positive_amount AS numeric(12, 2)
  CHECK (value > 0);

Constraints & Foreign Keys

Constraints are PostgreSQL's way of enforcing data integrity at the database level — not in application code where bugs can bypass them.

Constraint	Purpose	Example
NOT NULL	Column must have a value	email text NOT NULL
UNIQUE	No duplicate values (allows one NULL)	UNIQUE(email)
PRIMARY KEY	NOT NULL + UNIQUE, identifies the row	id bigint PRIMARY KEY
CHECK	Arbitrary boolean expression	CHECK (age >= 0 AND age < 150)
FOREIGN KEY	References another table's PK/unique	REFERENCES users(id)
EXCLUDE	No overlapping values (GiST)	EXCLUDE USING GIST (range WITH &&)
DEFAULT	Value when not specified	DEFAULT now()

Foreign key actionssql

CREATE TABLE order_items (
  id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  order_id bigint NOT NULL
    REFERENCES orders(id) ON DELETE CASCADE,
  product_id bigint NOT NULL
    REFERENCES products(id) ON DELETE RESTRICT,
  quantity integer NOT NULL CHECK (quantity > 0)
);

-- ON DELETE actions:
-- RESTRICT  — block delete if children exist (safest)
-- CASCADE   — delete children too (use for owned entities)
-- SET NULL  — set FK to NULL (use for optional relationships)
-- NO ACTION — like RESTRICT but deferrable

-- CRITICAL: always index FK columns on the child table
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);

Always Index Foreign Keys

PostgreSQL does NOT automatically create an index on foreign key columns. Without an index, every DELETE or UPDATE on the parent table triggers a sequential scan on the child table to check for references. This is the #1 cause of unexpected slow deletes.

Normalization & Partitioning

Normalization

Normal Forms

✅1NF — atomic values, no repeating groups (every column holds a single value)
✅2NF — no partial dependencies (every non-key column depends on the full PK)
✅3NF — no transitive dependencies (non-key columns don't depend on other non-key columns)
✅BCNF — every determinant is a candidate key (stricter than 3NF)

Normalize by default. Denormalize only when you have measured proof that joins are the bottleneck — and even then, consider materialized views first.

Declarative Partitioning

Range partitioning by datesql

-- Parent table (no data stored here directly)
CREATE TABLE events (
  id bigint GENERATED ALWAYS AS IDENTITY,
  event_type text NOT NULL,
  payload jsonb,
  created_at timestamptz NOT NULL DEFAULT now()
) PARTITION BY RANGE (created_at);

-- Create monthly partitions
CREATE TABLE events_2024_01 PARTITION OF events
  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- Queries automatically prune irrelevant partitions
SELECT * FROM events
WHERE created_at >= '2024-01-15' AND created_at < '2024-01-20';
-- Only scans events_2024_01

When Partitioning Helps

Partition when: tables exceed 100M+ rows, you have time-based data with retention policies (drop old partitions instead of DELETE), or you need partition-wise aggregation. Don't partition small tables — it adds planning overhead without benefit.

Interview Questions

Q:When would you use jsonb vs a proper relational column?

A: Use jsonb for truly schema-less data: event payloads, third-party webhook bodies, user preferences with unknown structure. Use proper columns when you query/filter on the field regularly, need foreign keys, or need type safety. The rule: if you're writing WHERE payload->>'status' = 'active' in most queries, 'status' should be a column.

Q:Why use timestamptz over timestamp?

A: timestamp stores the value as-is with no timezone context. If your server moves timezones, or clients are in different timezones, the data becomes ambiguous. timestamptz converts to UTC on storage and displays in the session's timezone — it's always unambiguous. There's no storage difference (both 8 bytes), so always use timestamptz.

Q:What are exclusion constraints and when would you use them?

A: Exclusion constraints prevent rows from having overlapping values using GiST indexes. Classic use case: preventing double-bookings. EXCLUDE USING GIST (room_id WITH =, time_range WITH &&) ensures no two bookings for the same room have overlapping time ranges. This is impossible to enforce reliably with application code due to race conditions.

Q:UUID v4 vs bigserial for primary keys — trade-offs?

A: bigserial: 8 bytes, sequential (great index locality), not globally unique, guessable. UUID v4: 16 bytes, random (poor index locality — random writes scatter across B-tree), globally unique, not guessable. UUID v7 is the best of both: time-ordered (good locality) + random suffix (globally unique). Use bigserial for internal high-write tables, UUID for external-facing IDs.

Common Mistakes

⚠️

Using timestamp instead of timestamptz

CREATE TABLE events (created_at timestamp DEFAULT now()) — stores without timezone context.

✅Always use timestamptz. It stores as UTC internally and converts to the session timezone on display. There's no performance difference, only correctness.

⚠️

Not indexing foreign key columns

Adding REFERENCES users(id) without creating an index on the FK column in the child table.

✅PostgreSQL doesn't auto-create FK indexes. Without them, parent table DELETEs trigger sequential scans on child tables. Always: CREATE INDEX idx_child_parent_id ON child(parent_id);

⚠️

Using varchar(255) everywhere

Cargo-culting varchar(255) from MySQL — every text column is varchar(255).

✅In PostgreSQL, text and varchar have identical performance. Use text as the default. Add a CHECK constraint if you need length validation — it's easier to modify than altering a varchar limit.

⚠️

Storing money as float

Using double precision or real for financial amounts.

✅Use numeric(12, 2) for exact decimal arithmetic, or store as integer cents (amount_cents integer). Floating point cannot represent 0.1 exactly — you will have rounding errors in financial calculations.