B-TreeLSM TreeSSTableComposite IndexCovering IndexPartial IndexQuery Optimization

Indexing

Master database indexing — B-tree indexes, LSM trees & SSTables, composite indexes, partial indexes, and covering indexes. Understand the trade-offs that make queries fast.

28 min read11 sections

The Big Picture — What Is an Index?

An index is a data structure that helps a database find rows without scanning every single row in the table. Without an index, finding one user in a table of 100 million rows means reading all 100 million rows. With an index, it means reading ~27 rows (log₂ of 100M). That's the difference between a 10-second query and a 0.1ms query.

📖

The Book Index Analogy

A 500-page textbook without an index: to find 'B-tree', you flip through every page. With an index at the back: you look up 'B-tree → page 247' and jump directly there. The index is a small, sorted lookup structure that trades a bit of extra space (the index pages) for massive speed improvement. A database index works identically — it's a sorted structure that maps column values to the physical location of rows on disk.

🔥 Key Insight

Indexes make reads fast and writes slow. Every INSERT, UPDATE, or DELETE must also update every index on the table. A table with 10 indexes means every write does 11 operations (1 table + 10 indexes). The art of indexing is adding just enough indexes to make your critical queries fast without crippling write performance.

How Databases Store Data

Full Table Scan vs Indexed Lookup

The Performance Differencetext

Table: users (100,000,000 rows)
Query: SELECT * FROM users WHERE email = 'alice@mail.com'

Without index (full table scan):
  → Read all 100M rows from disk
  → Check each row's email column
  → Time: O(n) = ~100,000,000 comparisons
  → Duration: 5-30 seconds (depending on disk speed)

With B-tree index on email:
  → Navigate the tree: ~27 levels (log₂ 100M)
  → Each level = 1 disk read
  → Find the row pointer → read 1 row
  → Time: O(log n) = ~27 comparisons
  → Duration: 0.1-1ms

Speedup: ~100,000x

Read vs Write Trade-off

📖 Reads (Benefit from Indexes)

SELECT with WHERE → index lookup instead of full scan
JOIN on indexed columns → fast matching
ORDER BY on indexed columns → already sorted
More indexes = faster reads (for indexed queries)

✏️ Writes (Hurt by Indexes)

INSERT → must add entry to every index
UPDATE on indexed column → update the index too
DELETE → must remove entry from every index
More indexes = slower writes (more maintenance)

💡 The Rule of Thumb

Index the columns you filter on (WHERE), join on (JOIN ON), and sort by (ORDER BY). Don't index columns you rarely query. For read-heavy systems (90% reads), add indexes aggressively. For write-heavy systems (90% writes), minimize indexes.

B-Tree Indexes

B-trees are the default index structure in virtually every relational database — PostgreSQL, MySQL, SQL Server, Oracle. They're a balanced, sorted tree where each node can hold multiple keys and pointers, optimized for disk-based storage.

🗂️

The Sorted Filing Cabinet

Imagine a filing cabinet where folders are sorted alphabetically. To find 'Martinez', you don't scan from A — you open the middle drawer (M-P), then the M section, then scan a few folders. Each step eliminates half the remaining options. A B-tree works the same way: each level of the tree narrows the search by a large factor. With a branching factor of 100 (typical), 3 levels can index 1 million rows. 4 levels can index 100 million rows.

B-Tree — How a Lookup Workstext

B-tree index on users.email (branching factor ~100)

Query: WHERE email = 'martinez@mail.com'

Level 0 (root):     [... garcia ... lee ... patel ... zhang ...]
                     → 'martinez' is between 'lee' and 'patel'
                     → Follow pointer to child node

Level 1 (internal): [... kim ... lin ... martinez ... miller ...]
                     → Found 'martinez'!
                     → Follow pointer to leaf node

Level 2 (leaf):     [martinez@mail.com → row_id: 4,827,391]
                     → Read row 4,827,391 from the table

Total: 3 disk reads to find 1 row out of millions.

For 100M rows with branching factor 100:
  Levels needed = log₁₀₀(100M) ≈ 4 levels
  4 disk reads to find any row. Always.

Why B-Trees Excel

Strengths

✅Balanced: every lookup takes the same number of steps
✅Sorted: range queries (BETWEEN, >, <) are efficient
✅Disk-optimized: nodes are page-sized (4-16 KB), matching disk blocks
✅Supports equality AND range queries
✅In-place updates: modify values without rewriting the tree

Characteristics

✅Read-optimized: O(log n) for any lookup
✅Write cost: O(log n) for insert/update/delete
✅Space: ~10-30% of table size
✅Default in PostgreSQL, MySQL, SQL Server, Oracle
✅Best for: OLTP workloads (mixed read/write)

🎯 Interview Insight

When asked "how do indexes work?" — describe B-trees. They're the default in every major relational database. Key points: balanced tree, O(log n) lookups, sorted (supports range queries), disk-page-aligned nodes. Mention that the branching factor is high (~100-500), so even billions of rows need only 4-5 levels.

LSM Trees & SSTables

LSM (Log-Structured Merge) trees take the opposite approach from B-trees. Instead of updating data in place on disk, they buffer all writes in memory and periodically flush sorted batches to disk. This makes writes extremely fast at the cost of more complex reads.

📝

The Notebook Analogy

B-tree: you have a sorted filing cabinet. Every new document goes directly into its correct alphabetical position — you must find the right spot and insert it (slow writes, fast reads). LSM tree: you have a notebook. Every new document is written at the end of the current page (fast writes). When the page is full, you sort it and file it away as a batch. To find a document, you check the current page first, then the most recent filed batch, then older batches (slower reads, but writes are blazing fast).

How LSM Trees Work

Write to Memtable (in-memory)

All writes go to an in-memory sorted structure (usually a red-black tree or skip list) called the memtable. Writes are O(log n) in memory — extremely fast. A write-ahead log (WAL) on disk ensures durability if the process crashes.

Flush to SSTable (on disk)

When the memtable reaches a size threshold (e.g., 64 MB), it's flushed to disk as an SSTable (Sorted String Table) — an immutable, sorted file. The memtable is cleared and a new one starts. SSTables are never modified after creation.

Compaction (merge SSTables)

Over time, many SSTables accumulate. Compaction merges multiple SSTables into fewer, larger ones — removing duplicates and deleted entries. This keeps read performance manageable. Without compaction, reads would need to check hundreds of files.

Read path

To read a key: check the memtable first (newest data), then SSTables from newest to oldest. Bloom filters on each SSTable quickly tell you 'this key is definitely NOT here' — skipping most files. Worst case: check all levels.

LSM Tree — Write & Read Flowtext

WRITE: SET user:42 = "Alice"
  1. Append to WAL (durability)           → ~0.01ms
  2. Insert into memtable (in-memory)     → ~0.001ms
  Done. Total: ~0.01ms

  Later (async): memtable full → flush to SSTable on disk

READ: GET user:42
  1. Check memtable                       → found? return.
  2. Check SSTable L0 (newest)            → bloom filter says NO → skip
  3. Check SSTable L1                     → bloom filter says MAYBE → read → found!
  Done. Total: 1-5 disk reads

COMPACTION (background):
  L0: [SST1] [SST2] [SST3]  → merge → L1: [SST_merged]
  Removes duplicates, deleted keys, reduces file count

Dimension	B-Tree	LSM Tree
Write speed	Moderate (in-place update on disk)	Very fast (append to memory)
Read speed	Fast (single tree traversal)	Moderate (check multiple levels)
Write amplification	Low (update one page)	High (data rewritten during compaction)
Read amplification	Low (one path through tree)	Higher (check memtable + multiple SSTables)
Space amplification	Low	Higher (duplicates until compaction)
Best for	Read-heavy, mixed OLTP	Write-heavy (logs, analytics, time-series)
Used in	PostgreSQL, MySQL, SQL Server	Cassandra, RocksDB, LevelDB, HBase

🎯 Interview Insight

B-trees for read-heavy workloads (OLTP, web apps). LSM trees for write-heavy workloads (logging, analytics, IoT, time-series). Know the trade-off: LSM trees sacrifice read performance and increase write amplification (compaction) to achieve much faster writes. Most NoSQL databases (Cassandra, RocksDB) use LSM trees.

Composite Indexes

A composite index is an index on multiple columns. The column order is critical — it determines which queries can use the index. Think of it like a phone book sorted by last name, then first name.

Composite Index — Column Order Matterssql

CREATE INDEX idx_user_status_date ON orders(user_id, status, created_at);

-- ✅ Uses the index (leftmost prefix: user_id)
SELECT * FROM orders WHERE user_id = 42;

-- ✅ Uses the index (prefix: user_id + status)
SELECT * FROM orders WHERE user_id = 42 AND status = 'shipped';

-- ✅ Uses the full index (all 3 columns)
SELECT * FROM orders
WHERE user_id = 42 AND status = 'shipped' AND created_at > '2024-01-01';

-- ❌ CANNOT use this index (skips user_id)
SELECT * FROM orders WHERE status = 'shipped';

-- ❌ CANNOT use this index (skips user_id and status)
SELECT * FROM orders WHERE created_at > '2024-01-01';

-- ⚠️ Uses only user_id part (skips status, jumps to created_at)
SELECT * FROM orders WHERE user_id = 42 AND created_at > '2024-01-01';

📞

The Phone Book Rule

A phone book is sorted by (last_name, first_name, city). You can look up: all 'Smiths' ✅, 'Smith, John' ✅, 'Smith, John, NYC' ✅. But you CANNOT efficiently look up: all 'Johns' ❌ (first name without last name), or all people in 'NYC' ❌ (city without last name). You must use the columns from left to right — you can stop early, but you can't skip.

The Prefix Rule

Which Queries Use the Index?text

Index: (A, B, C)

Query filters on:    Uses index?    How much?
A                    ✅ Yes         Full prefix (A)
A, B                 ✅ Yes         Full prefix (A, B)
A, B, C              ✅ Yes         Full index (A, B, C)
B                    ❌ No          Skips A (leftmost column)
C                    ❌ No          Skips A and B
B, C                 ❌ No          Skips A
A, C                 ⚠️ Partial    Uses A only, skips B to reach C

Rule: the index is used left-to-right. You can stop early,
but you cannot skip columns in the middle.

🎯 Interview Insight — Designing Composite Indexes

Put the most selective column first (the one that narrows results the most). Put equality columns before range columns: (user_id, status, created_at) — not (created_at, user_id, status). Equality filters (=) use the index fully; range filters (>, <, BETWEEN) stop the index from being used for subsequent columns.

Partial Indexes

A partial index only indexes rows that match a condition. Instead of indexing all 100M rows, you index only the 500K rows you actually query — making the index smaller, faster, and cheaper to maintain.

Partial Index — Examplessql

-- Full index: indexes ALL 100M users (including inactive)
CREATE INDEX idx_email ON users(email);
-- Size: ~2 GB, updated on every user insert/update

-- Partial index: indexes only active users (5% of total)
CREATE INDEX idx_active_email ON users(email) WHERE is_active = true;
-- Size: ~100 MB, updated only when active users change

-- Partial index: only unprocessed orders
CREATE INDEX idx_pending ON orders(created_at) WHERE status = 'pending';
-- Size: tiny (only pending orders, not the 99% that are completed)

-- Query that uses the partial index:
SELECT * FROM users WHERE email = 'alice@mail.com' AND is_active = true;
-- → Uses idx_active_email (small, fast)

-- Query that CANNOT use the partial index:
SELECT * FROM users WHERE email = 'alice@mail.com' AND is_active = false;
-- → Cannot use idx_active_email (condition doesn't match)

Strengths

✅Much smaller than full indexes (index only what you query)
✅Faster lookups (smaller tree to traverse)
✅Less write overhead (fewer rows to maintain)
✅Perfect for: active users, pending orders, recent data

Limitations

❌Only works for queries matching the WHERE condition
❌Not all databases support them (PostgreSQL yes, MySQL no)
❌Must match the partial condition exactly in your query
❌Can be confusing if developers don't know the condition

🎯 Interview Insight

Partial indexes are an optimization technique — mention them when discussing how to reduce index size and write overhead. Classic example: "We have 100M users but only 2M are active. A partial index on active users is 50x smaller and only maintained on active user changes."

Covering Indexes

A covering index contains all the columns a query needs. The database can answer the query entirely from the index without ever reading the actual table row. This eliminates the most expensive step — the random disk read to fetch the full row.

📋

The Summary Sheet

Imagine a library catalog card that shows: title, author, year, and shelf location. If you only need the title and author, the catalog card has everything — you don't need to walk to the shelf and pull out the book. That's a covering index: the index itself contains all the data the query needs. No 'table lookup' required.

Covering Index — Examplesql

-- Query: get name and email for active users
SELECT name, email FROM users WHERE is_active = true;

-- Regular index (NOT covering):
CREATE INDEX idx_active ON users(is_active);
-- Step 1: Use index to find row IDs where is_active = true
-- Step 2: For EACH row ID, read the full row from the table (random I/O!)
-- Step 3: Extract name and email from the full row
-- → Slow: millions of random disk reads

-- Covering index (includes all needed columns):
CREATE INDEX idx_active_covering ON users(is_active) INCLUDE (name, email);
-- Step 1: Use index to find entries where is_active = true
-- Step 2: name and email are IN the index → return directly
-- → Fast: no table access needed, sequential index scan only

-- PostgreSQL EXPLAIN output:
-- "Index Only Scan using idx_active_covering"  ← covering index used!
-- vs
-- "Index Scan using idx_active"                ← table lookup needed

Strengths

✅Eliminates table lookups (the slowest part of indexed reads)
✅Dramatically faster for queries that return many rows
✅Index-only scans: all data comes from the index
✅Perfect for: dashboards, reports, listing pages

Trade-offs

❌Larger index (stores extra columns)
❌More write overhead (extra columns updated in index)
❌Only helps queries that use exactly those columns
❌INCLUDE columns can't be used for filtering (only for retrieval)

🎯 Interview Insight

Covering indexes are the ultimate read optimization. When a query is slow despite having an index, check if it's doing table lookups for columns not in the index. Adding those columns with INCLUDE can turn an "Index Scan" into an "Index Only Scan" — often a 10-100x speedup for queries returning many rows.

End-to-End Scenario

Let's design the indexing strategy for an e-commerce platform — the most common interview scenario for indexing.

🛒 E-Commerce — Key Queries

Tables: products (5M rows), orders (200M rows), users (50M rows).

Read-heavy: 95% reads, 5% writes.

Query-Driven Index Designsql

-- Query 1: Product search by category + price sort
SELECT id, name, price, image_url
FROM products
WHERE category_id = 5 AND in_stock = true
ORDER BY price ASC
LIMIT 20;

→ Composite index: (category_id, in_stock, price)
  Equality columns first, range/sort column last.
→ Covering index: INCLUDE (name, image_url)
  Avoids table lookup for listing page.

CREATE INDEX idx_product_search
  ON products(category_id, in_stock, price)
  INCLUDE (name, image_url);


-- Query 2: User's order history
SELECT id, total, status, created_at
FROM orders
WHERE user_id = 42
ORDER BY created_at DESC
LIMIT 10;

→ Composite index: (user_id, created_at DESC)
  user_id for filtering, created_at for sorting.
→ Covering index: INCLUDE (total, status)

CREATE INDEX idx_user_orders
  ON orders(user_id, created_at DESC)
  INCLUDE (total, status);


-- Query 3: Admin dashboard — pending orders count
SELECT COUNT(*) FROM orders WHERE status = 'pending';

→ Partial index: only pending orders (0.5% of total)

CREATE INDEX idx_pending_orders
  ON orders(status)
  WHERE status = 'pending';
  -- Indexes 1M rows instead of 200M. 200x smaller.


-- Query 4: Login by email
SELECT id, name, password_hash FROM users WHERE email = 'alice@mail.com';

→ Unique index on email (also enforces uniqueness)
→ Covering: INCLUDE (name, password_hash)

CREATE UNIQUE INDEX idx_user_email
  ON users(email)
  INCLUDE (name, password_hash);

🔥 The Process

(1) List your most frequent queries. (2) For each query, identify the WHERE, ORDER BY, and SELECT columns. (3) Create a composite index with equality columns first, then range/sort columns. (4) Add INCLUDE for SELECT columns to make it covering. (5) Use partial indexes for queries that filter on a small subset.

Trade-offs & Decision Making

B-Tree vs LSM Tree

Dimension	B-Tree	LSM Tree
Optimized for	Reads (balanced tree traversal)	Writes (sequential memory + disk)
Write speed	Moderate (random I/O to update pages)	Very fast (sequential append to memory)
Read speed	Fast (single tree path)	Moderate (check memtable + multiple SSTables)
Space usage	Efficient (in-place updates)	Higher (duplicates until compaction)
Range queries	Excellent (sorted tree)	Good (sorted SSTables, but across levels)
Use case	OLTP, web apps, mixed workloads	Write-heavy: logs, analytics, time-series
Databases	PostgreSQL, MySQL, SQL Server	Cassandra, RocksDB, LevelDB, ScyllaDB

Index Types Comparison

Index Type	What It Does	When to Use	Cost
Single column	Index on one column	Simple WHERE filters	Low
Composite	Index on multiple columns	Multi-column WHERE + ORDER BY	Medium
Partial	Index on subset of rows	Queries filtering on rare conditions	Low (small index)
Covering	Index includes all query columns	Read-heavy queries returning many rows	Higher (larger index)
Unique	Enforces uniqueness + fast lookup	Email, username, natural keys	Low

🎯 Decision Framework

Start with no indexes (besides primary key). Run your queries. Use EXPLAIN ANALYZE to find slow ones. Add indexes for the slowest, most frequent queries first. Monitor write performance — if writes slow down, you have too many indexes. The goal: minimum indexes for maximum query coverage.

Interview Questions

Q:Why are indexes important?

A: Without indexes, every query requires a full table scan — reading every row to find matches. For a table with 100M rows, that's seconds to minutes per query. Indexes provide O(log n) lookups instead of O(n) — finding a row in ~27 steps instead of 100M. They're the single biggest performance lever in database optimization. The trade-off: indexes slow down writes (every insert/update must maintain the index) and use additional storage.

Q:When would you use a composite index?

A: When your queries filter on multiple columns. Example: WHERE user_id = 42 AND status = 'shipped' AND created_at > '2024-01-01'. A composite index on (user_id, status, created_at) serves this query with a single index scan. Column order matters: put equality columns first (user_id, status), then range columns (created_at). The index follows the leftmost prefix rule — it can be used for queries on (user_id), (user_id, status), or all three, but NOT for (status) alone.

Q:What's the difference between B-Tree and LSM tree?

A: B-trees update data in place on disk — fast reads (single tree traversal), moderate writes (random I/O to update pages). Used in PostgreSQL, MySQL. LSM trees buffer writes in memory and flush sorted batches to disk — very fast writes (sequential I/O), moderate reads (must check multiple levels). Used in Cassandra, RocksDB. Choose B-tree for read-heavy OLTP workloads. Choose LSM for write-heavy workloads (logging, analytics, time-series).

Your product listing page takes 3 seconds to load

The query filters by category, sorts by price, and returns name + image. How do you optimize?

Answer: Create a composite covering index: CREATE INDEX idx ON products(category_id, price) INCLUDE (name, image_url). The composite part (category_id, price) handles the WHERE and ORDER BY. The INCLUDE part (name, image_url) makes it a covering index — the database returns results entirely from the index without touching the table. This turns a 3-second query into a 5ms query. Verify with EXPLAIN ANALYZE — you should see 'Index Only Scan'.

Your orders table has 15 indexes and writes are slow

How do you reduce write overhead without breaking read performance?

Answer: Audit index usage. In PostgreSQL: SELECT * FROM pg_stat_user_indexes — check idx_scan (how often each index is used). Remove indexes with 0 or very low scans. Merge overlapping indexes: if you have indexes on (user_id) and (user_id, status), the composite one covers both — drop the single-column one. Replace full indexes with partial indexes where possible (index only pending orders instead of all orders). Target: 3-5 well-designed indexes instead of 15 redundant ones.

Common Mistakes

📊

Over-indexing

Adding an index for every column 'just in case.' A table with 15 indexes means every INSERT does 16 writes (1 table + 15 indexes). Write throughput drops by 10-15x. Most of those indexes are never used.

✅Index only columns used in frequent WHERE, JOIN, and ORDER BY clauses. Audit index usage regularly (pg_stat_user_indexes in PostgreSQL). Remove unused indexes. Aim for 3-5 well-designed indexes per table, not 15.

🔀

Wrong column order in composite indexes

Creating an index on (created_at, user_id, status) when your query filters on user_id and status first. The index can't be used because the leftmost column (created_at) isn't in the WHERE clause. The query falls back to a full table scan.

✅Put equality columns first, then range/sort columns. For WHERE user_id = 42 AND status = 'shipped' ORDER BY created_at: index should be (user_id, status, created_at) — not (created_at, user_id, status).

✏️

Ignoring write cost

Adding indexes to speed up reads without measuring the impact on writes. A write-heavy table (event logs, analytics) with 10 indexes becomes a bottleneck — every event insert updates 10 index structures.

✅For write-heavy tables, minimize indexes. Consider LSM-tree-based storage (RocksDB, Cassandra) which is optimized for writes. If you need fast reads on write-heavy data, use a read replica with indexes — keep the primary write-optimized.

📖

Not using covering indexes

Having an index on the WHERE columns but still doing millions of table lookups to fetch the SELECT columns. The index finds the row IDs fast, but reading each full row from the table is slow random I/O.

✅Add frequently selected columns to the index with INCLUDE. For listing pages that return name, price, image — include those in the index. Check EXPLAIN output: 'Index Scan' (table lookup needed) vs 'Index Only Scan' (covering, no table access).

Indexing

Table of Contents

The Big Picture — What Is an Index?

The Book Index Analogy

How Databases Store Data

Full Table Scan vs Indexed Lookup

Read vs Write Trade-off

📖 Reads (Benefit from Indexes)

✏️ Writes (Hurt by Indexes)

B-Tree Indexes

The Sorted Filing Cabinet

Why B-Trees Excel

Strengths

Characteristics

LSM Trees & SSTables

The Notebook Analogy

How LSM Trees Work

Write to Memtable (in-memory)

Flush to SSTable (on disk)

Compaction (merge SSTables)

Read path

Composite Indexes

The Phone Book Rule

The Prefix Rule

Partial Indexes

Strengths

Limitations

Covering Indexes

The Summary Sheet

Strengths

Trade-offs

End-to-End Scenario

🛒 E-Commerce — Key Queries

Trade-offs & Decision Making

B-Tree vs LSM Tree

Index Types Comparison

Interview Questions

Q:Why are indexes important?

Q:When would you use a composite index?

Q:What's the difference between B-Tree and LSM tree?

Your product listing page takes 3 seconds to load

Your orders table has 15 indexes and writes are slow

Common Mistakes

Over-indexing

Wrong column order in composite indexes

Ignoring write cost

Not using covering indexes