Indexing
Master database indexing — B-tree indexes, LSM trees & SSTables, composite indexes, partial indexes, and covering indexes. Understand the trade-offs that make queries fast.
Table of Contents
The Big Picture — What Is an Index?
An index is a data structure that helps a database find rows without scanning every single row in the table. Without an index, finding one user in a table of 100 million rows means reading all 100 million rows. With an index, it means reading ~27 rows (log₂ of 100M). That's the difference between a 10-second query and a 0.1ms query.
The Book Index Analogy
A 500-page textbook without an index: to find 'B-tree', you flip through every page. With an index at the back: you look up 'B-tree → page 247' and jump directly there. The index is a small, sorted lookup structure that trades a bit of extra space (the index pages) for massive speed improvement. A database index works identically — it's a sorted structure that maps column values to the physical location of rows on disk.
🔥 Key Insight
Indexes make reads fast and writes slow. Every INSERT, UPDATE, or DELETE must also update every index on the table. A table with 10 indexes means every write does 11 operations (1 table + 10 indexes). The art of indexing is adding just enough indexes to make your critical queries fast without crippling write performance.
How Databases Store Data
Full Table Scan vs Indexed Lookup
Table: users (100,000,000 rows) Query: SELECT * FROM users WHERE email = 'alice@mail.com' Without index (full table scan): → Read all 100M rows from disk → Check each row's email column → Time: O(n) = ~100,000,000 comparisons → Duration: 5-30 seconds (depending on disk speed) With B-tree index on email: → Navigate the tree: ~27 levels (log₂ 100M) → Each level = 1 disk read → Find the row pointer → read 1 row → Time: O(log n) = ~27 comparisons → Duration: 0.1-1ms Speedup: ~100,000x
Read vs Write Trade-off
📖 Reads (Benefit from Indexes)
- SELECT with WHERE → index lookup instead of full scan
- JOIN on indexed columns → fast matching
- ORDER BY on indexed columns → already sorted
- More indexes = faster reads (for indexed queries)
✏️ Writes (Hurt by Indexes)
- INSERT → must add entry to every index
- UPDATE on indexed column → update the index too
- DELETE → must remove entry from every index
- More indexes = slower writes (more maintenance)
💡 The Rule of Thumb
Index the columns you filter on (WHERE), join on (JOIN ON), and sort by (ORDER BY). Don't index columns you rarely query. For read-heavy systems (90% reads), add indexes aggressively. For write-heavy systems (90% writes), minimize indexes.
B-Tree Indexes
B-trees are the default index structure in virtually every relational database — PostgreSQL, MySQL, SQL Server, Oracle. They're a balanced, sorted tree where each node can hold multiple keys and pointers, optimized for disk-based storage.
The Sorted Filing Cabinet
Imagine a filing cabinet where folders are sorted alphabetically. To find 'Martinez', you don't scan from A — you open the middle drawer (M-P), then the M section, then scan a few folders. Each step eliminates half the remaining options. A B-tree works the same way: each level of the tree narrows the search by a large factor. With a branching factor of 100 (typical), 3 levels can index 1 million rows. 4 levels can index 100 million rows.
B-tree index on users.email (branching factor ~100) Query: WHERE email = 'martinez@mail.com' Level 0 (root): [... garcia ... lee ... patel ... zhang ...] → 'martinez' is between 'lee' and 'patel' → Follow pointer to child node Level 1 (internal): [... kim ... lin ... martinez ... miller ...] → Found 'martinez'! → Follow pointer to leaf node Level 2 (leaf): [martinez@mail.com → row_id: 4,827,391] → Read row 4,827,391 from the table Total: 3 disk reads to find 1 row out of millions. For 100M rows with branching factor 100: Levels needed = log₁₀₀(100M) ≈ 4 levels 4 disk reads to find any row. Always.
Why B-Trees Excel
Strengths
- ✅Balanced: every lookup takes the same number of steps
- ✅Sorted: range queries (BETWEEN, >, <) are efficient
- ✅Disk-optimized: nodes are page-sized (4-16 KB), matching disk blocks
- ✅Supports equality AND range queries
- ✅In-place updates: modify values without rewriting the tree
Characteristics
- ✅Read-optimized: O(log n) for any lookup
- ✅Write cost: O(log n) for insert/update/delete
- ✅Space: ~10-30% of table size
- ✅Default in PostgreSQL, MySQL, SQL Server, Oracle
- ✅Best for: OLTP workloads (mixed read/write)
🎯 Interview Insight
When asked "how do indexes work?" — describe B-trees. They're the default in every major relational database. Key points: balanced tree, O(log n) lookups, sorted (supports range queries), disk-page-aligned nodes. Mention that the branching factor is high (~100-500), so even billions of rows need only 4-5 levels.
LSM Trees & SSTables
LSM (Log-Structured Merge) trees take the opposite approach from B-trees. Instead of updating data in place on disk, they buffer all writes in memory and periodically flush sorted batches to disk. This makes writes extremely fast at the cost of more complex reads.
The Notebook Analogy
B-tree: you have a sorted filing cabinet. Every new document goes directly into its correct alphabetical position — you must find the right spot and insert it (slow writes, fast reads). LSM tree: you have a notebook. Every new document is written at the end of the current page (fast writes). When the page is full, you sort it and file it away as a batch. To find a document, you check the current page first, then the most recent filed batch, then older batches (slower reads, but writes are blazing fast).
How LSM Trees Work
Write to Memtable (in-memory)
All writes go to an in-memory sorted structure (usually a red-black tree or skip list) called the memtable. Writes are O(log n) in memory — extremely fast. A write-ahead log (WAL) on disk ensures durability if the process crashes.
Flush to SSTable (on disk)
When the memtable reaches a size threshold (e.g., 64 MB), it's flushed to disk as an SSTable (Sorted String Table) — an immutable, sorted file. The memtable is cleared and a new one starts. SSTables are never modified after creation.
Compaction (merge SSTables)
Over time, many SSTables accumulate. Compaction merges multiple SSTables into fewer, larger ones — removing duplicates and deleted entries. This keeps read performance manageable. Without compaction, reads would need to check hundreds of files.
Read path
To read a key: check the memtable first (newest data), then SSTables from newest to oldest. Bloom filters on each SSTable quickly tell you 'this key is definitely NOT here' — skipping most files. Worst case: check all levels.
WRITE: SET user:42 = "Alice" 1. Append to WAL (durability) → ~0.01ms 2. Insert into memtable (in-memory) → ~0.001ms Done. Total: ~0.01ms Later (async): memtable full → flush to SSTable on disk READ: GET user:42 1. Check memtable → found? return. 2. Check SSTable L0 (newest) → bloom filter says NO → skip 3. Check SSTable L1 → bloom filter says MAYBE → read → found! Done. Total: 1-5 disk reads COMPACTION (background): L0: [SST1] [SST2] [SST3] → merge → L1: [SST_merged] Removes duplicates, deleted keys, reduces file count
| Dimension | B-Tree | LSM Tree |
|---|---|---|
| Write speed | Moderate (in-place update on disk) | Very fast (append to memory) |
| Read speed | Fast (single tree traversal) | Moderate (check multiple levels) |
| Write amplification | Low (update one page) | High (data rewritten during compaction) |
| Read amplification | Low (one path through tree) | Higher (check memtable + multiple SSTables) |
| Space amplification | Low | Higher (duplicates until compaction) |
| Best for | Read-heavy, mixed OLTP | Write-heavy (logs, analytics, time-series) |
| Used in | PostgreSQL, MySQL, SQL Server | Cassandra, RocksDB, LevelDB, HBase |
🎯 Interview Insight
B-trees for read-heavy workloads (OLTP, web apps). LSM trees for write-heavy workloads (logging, analytics, IoT, time-series). Know the trade-off: LSM trees sacrifice read performance and increase write amplification (compaction) to achieve much faster writes. Most NoSQL databases (Cassandra, RocksDB) use LSM trees.
Composite Indexes
A composite index is an index on multiple columns. The column order is critical — it determines which queries can use the index. Think of it like a phone book sorted by last name, then first name.
CREATE INDEX idx_user_status_date ON orders(user_id, status, created_at); -- ✅ Uses the index (leftmost prefix: user_id) SELECT * FROM orders WHERE user_id = 42; -- ✅ Uses the index (prefix: user_id + status) SELECT * FROM orders WHERE user_id = 42 AND status = 'shipped'; -- ✅ Uses the full index (all 3 columns) SELECT * FROM orders WHERE user_id = 42 AND status = 'shipped' AND created_at > '2024-01-01'; -- ❌ CANNOT use this index (skips user_id) SELECT * FROM orders WHERE status = 'shipped'; -- ❌ CANNOT use this index (skips user_id and status) SELECT * FROM orders WHERE created_at > '2024-01-01'; -- ⚠️ Uses only user_id part (skips status, jumps to created_at) SELECT * FROM orders WHERE user_id = 42 AND created_at > '2024-01-01';
The Phone Book Rule
A phone book is sorted by (last_name, first_name, city). You can look up: all 'Smiths' ✅, 'Smith, John' ✅, 'Smith, John, NYC' ✅. But you CANNOT efficiently look up: all 'Johns' ❌ (first name without last name), or all people in 'NYC' ❌ (city without last name). You must use the columns from left to right — you can stop early, but you can't skip.
The Prefix Rule
Index: (A, B, C) Query filters on: Uses index? How much? A ✅ Yes Full prefix (A) A, B ✅ Yes Full prefix (A, B) A, B, C ✅ Yes Full index (A, B, C) B ❌ No Skips A (leftmost column) C ❌ No Skips A and B B, C ❌ No Skips A A, C ⚠️ Partial Uses A only, skips B to reach C Rule: the index is used left-to-right. You can stop early, but you cannot skip columns in the middle.
🎯 Interview Insight — Designing Composite Indexes
Put the most selective column first (the one that narrows results the most). Put equality columns before range columns: (user_id, status, created_at) — not (created_at, user_id, status). Equality filters (=) use the index fully; range filters (>, <, BETWEEN) stop the index from being used for subsequent columns.
Partial Indexes
A partial index only indexes rows that match a condition. Instead of indexing all 100M rows, you index only the 500K rows you actually query — making the index smaller, faster, and cheaper to maintain.
-- Full index: indexes ALL 100M users (including inactive) CREATE INDEX idx_email ON users(email); -- Size: ~2 GB, updated on every user insert/update -- Partial index: indexes only active users (5% of total) CREATE INDEX idx_active_email ON users(email) WHERE is_active = true; -- Size: ~100 MB, updated only when active users change -- Partial index: only unprocessed orders CREATE INDEX idx_pending ON orders(created_at) WHERE status = 'pending'; -- Size: tiny (only pending orders, not the 99% that are completed) -- Query that uses the partial index: SELECT * FROM users WHERE email = 'alice@mail.com' AND is_active = true; -- → Uses idx_active_email (small, fast) -- Query that CANNOT use the partial index: SELECT * FROM users WHERE email = 'alice@mail.com' AND is_active = false; -- → Cannot use idx_active_email (condition doesn't match)
Strengths
- ✅Much smaller than full indexes (index only what you query)
- ✅Faster lookups (smaller tree to traverse)
- ✅Less write overhead (fewer rows to maintain)
- ✅Perfect for: active users, pending orders, recent data
Limitations
- ❌Only works for queries matching the WHERE condition
- ❌Not all databases support them (PostgreSQL yes, MySQL no)
- ❌Must match the partial condition exactly in your query
- ❌Can be confusing if developers don't know the condition
🎯 Interview Insight
Partial indexes are an optimization technique — mention them when discussing how to reduce index size and write overhead. Classic example: "We have 100M users but only 2M are active. A partial index on active users is 50x smaller and only maintained on active user changes."
Covering Indexes
A covering index contains all the columns a query needs. The database can answer the query entirely from the index without ever reading the actual table row. This eliminates the most expensive step — the random disk read to fetch the full row.
The Summary Sheet
Imagine a library catalog card that shows: title, author, year, and shelf location. If you only need the title and author, the catalog card has everything — you don't need to walk to the shelf and pull out the book. That's a covering index: the index itself contains all the data the query needs. No 'table lookup' required.
-- Query: get name and email for active users SELECT name, email FROM users WHERE is_active = true; -- Regular index (NOT covering): CREATE INDEX idx_active ON users(is_active); -- Step 1: Use index to find row IDs where is_active = true -- Step 2: For EACH row ID, read the full row from the table (random I/O!) -- Step 3: Extract name and email from the full row -- → Slow: millions of random disk reads -- Covering index (includes all needed columns): CREATE INDEX idx_active_covering ON users(is_active) INCLUDE (name, email); -- Step 1: Use index to find entries where is_active = true -- Step 2: name and email are IN the index → return directly -- → Fast: no table access needed, sequential index scan only -- PostgreSQL EXPLAIN output: -- "Index Only Scan using idx_active_covering" ← covering index used! -- vs -- "Index Scan using idx_active" ← table lookup needed
Strengths
- ✅Eliminates table lookups (the slowest part of indexed reads)
- ✅Dramatically faster for queries that return many rows
- ✅Index-only scans: all data comes from the index
- ✅Perfect for: dashboards, reports, listing pages
Trade-offs
- ❌Larger index (stores extra columns)
- ❌More write overhead (extra columns updated in index)
- ❌Only helps queries that use exactly those columns
- ❌INCLUDE columns can't be used for filtering (only for retrieval)
🎯 Interview Insight
Covering indexes are the ultimate read optimization. When a query is slow despite having an index, check if it's doing table lookups for columns not in the index. Adding those columns with INCLUDE can turn an "Index Scan" into an "Index Only Scan" — often a 10-100x speedup for queries returning many rows.
End-to-End Scenario
Let's design the indexing strategy for an e-commerce platform — the most common interview scenario for indexing.
🛒 E-Commerce — Key Queries
Tables: products (5M rows), orders (200M rows), users (50M rows).
Read-heavy: 95% reads, 5% writes.
-- Query 1: Product search by category + price sort SELECT id, name, price, image_url FROM products WHERE category_id = 5 AND in_stock = true ORDER BY price ASC LIMIT 20; → Composite index: (category_id, in_stock, price) Equality columns first, range/sort column last. → Covering index: INCLUDE (name, image_url) Avoids table lookup for listing page. CREATE INDEX idx_product_search ON products(category_id, in_stock, price) INCLUDE (name, image_url); -- Query 2: User's order history SELECT id, total, status, created_at FROM orders WHERE user_id = 42 ORDER BY created_at DESC LIMIT 10; → Composite index: (user_id, created_at DESC) user_id for filtering, created_at for sorting. → Covering index: INCLUDE (total, status) CREATE INDEX idx_user_orders ON orders(user_id, created_at DESC) INCLUDE (total, status); -- Query 3: Admin dashboard — pending orders count SELECT COUNT(*) FROM orders WHERE status = 'pending'; → Partial index: only pending orders (0.5% of total) CREATE INDEX idx_pending_orders ON orders(status) WHERE status = 'pending'; -- Indexes 1M rows instead of 200M. 200x smaller. -- Query 4: Login by email SELECT id, name, password_hash FROM users WHERE email = 'alice@mail.com'; → Unique index on email (also enforces uniqueness) → Covering: INCLUDE (name, password_hash) CREATE UNIQUE INDEX idx_user_email ON users(email) INCLUDE (name, password_hash);
🔥 The Process
(1) List your most frequent queries. (2) For each query, identify the WHERE, ORDER BY, and SELECT columns. (3) Create a composite index with equality columns first, then range/sort columns. (4) Add INCLUDE for SELECT columns to make it covering. (5) Use partial indexes for queries that filter on a small subset.
Trade-offs & Decision Making
B-Tree vs LSM Tree
| Dimension | B-Tree | LSM Tree |
|---|---|---|
| Optimized for | Reads (balanced tree traversal) | Writes (sequential memory + disk) |
| Write speed | Moderate (random I/O to update pages) | Very fast (sequential append to memory) |
| Read speed | Fast (single tree path) | Moderate (check memtable + multiple SSTables) |
| Space usage | Efficient (in-place updates) | Higher (duplicates until compaction) |
| Range queries | Excellent (sorted tree) | Good (sorted SSTables, but across levels) |
| Use case | OLTP, web apps, mixed workloads | Write-heavy: logs, analytics, time-series |
| Databases | PostgreSQL, MySQL, SQL Server | Cassandra, RocksDB, LevelDB, ScyllaDB |
Index Types Comparison
| Index Type | What It Does | When to Use | Cost |
|---|---|---|---|
| Single column | Index on one column | Simple WHERE filters | Low |
| Composite | Index on multiple columns | Multi-column WHERE + ORDER BY | Medium |
| Partial | Index on subset of rows | Queries filtering on rare conditions | Low (small index) |
| Covering | Index includes all query columns | Read-heavy queries returning many rows | Higher (larger index) |
| Unique | Enforces uniqueness + fast lookup | Email, username, natural keys | Low |
🎯 Decision Framework
Start with no indexes (besides primary key). Run your queries. Use EXPLAIN ANALYZE to find slow ones. Add indexes for the slowest, most frequent queries first. Monitor write performance — if writes slow down, you have too many indexes. The goal: minimum indexes for maximum query coverage.
Interview Questions
Q:Why are indexes important?
A: Without indexes, every query requires a full table scan — reading every row to find matches. For a table with 100M rows, that's seconds to minutes per query. Indexes provide O(log n) lookups instead of O(n) — finding a row in ~27 steps instead of 100M. They're the single biggest performance lever in database optimization. The trade-off: indexes slow down writes (every insert/update must maintain the index) and use additional storage.
Q:When would you use a composite index?
A: When your queries filter on multiple columns. Example: WHERE user_id = 42 AND status = 'shipped' AND created_at > '2024-01-01'. A composite index on (user_id, status, created_at) serves this query with a single index scan. Column order matters: put equality columns first (user_id, status), then range columns (created_at). The index follows the leftmost prefix rule — it can be used for queries on (user_id), (user_id, status), or all three, but NOT for (status) alone.
Q:What's the difference between B-Tree and LSM tree?
A: B-trees update data in place on disk — fast reads (single tree traversal), moderate writes (random I/O to update pages). Used in PostgreSQL, MySQL. LSM trees buffer writes in memory and flush sorted batches to disk — very fast writes (sequential I/O), moderate reads (must check multiple levels). Used in Cassandra, RocksDB. Choose B-tree for read-heavy OLTP workloads. Choose LSM for write-heavy workloads (logging, analytics, time-series).
Your product listing page takes 3 seconds to load
The query filters by category, sorts by price, and returns name + image. How do you optimize?
Answer: Create a composite covering index: CREATE INDEX idx ON products(category_id, price) INCLUDE (name, image_url). The composite part (category_id, price) handles the WHERE and ORDER BY. The INCLUDE part (name, image_url) makes it a covering index — the database returns results entirely from the index without touching the table. This turns a 3-second query into a 5ms query. Verify with EXPLAIN ANALYZE — you should see 'Index Only Scan'.
Your orders table has 15 indexes and writes are slow
How do you reduce write overhead without breaking read performance?
Answer: Audit index usage. In PostgreSQL: SELECT * FROM pg_stat_user_indexes — check idx_scan (how often each index is used). Remove indexes with 0 or very low scans. Merge overlapping indexes: if you have indexes on (user_id) and (user_id, status), the composite one covers both — drop the single-column one. Replace full indexes with partial indexes where possible (index only pending orders instead of all orders). Target: 3-5 well-designed indexes instead of 15 redundant ones.
Common Mistakes
Over-indexing
Adding an index for every column 'just in case.' A table with 15 indexes means every INSERT does 16 writes (1 table + 15 indexes). Write throughput drops by 10-15x. Most of those indexes are never used.
✅Index only columns used in frequent WHERE, JOIN, and ORDER BY clauses. Audit index usage regularly (pg_stat_user_indexes in PostgreSQL). Remove unused indexes. Aim for 3-5 well-designed indexes per table, not 15.
Wrong column order in composite indexes
Creating an index on (created_at, user_id, status) when your query filters on user_id and status first. The index can't be used because the leftmost column (created_at) isn't in the WHERE clause. The query falls back to a full table scan.
✅Put equality columns first, then range/sort columns. For WHERE user_id = 42 AND status = 'shipped' ORDER BY created_at: index should be (user_id, status, created_at) — not (created_at, user_id, status).
Ignoring write cost
Adding indexes to speed up reads without measuring the impact on writes. A write-heavy table (event logs, analytics) with 10 indexes becomes a bottleneck — every event insert updates 10 index structures.
✅For write-heavy tables, minimize indexes. Consider LSM-tree-based storage (RocksDB, Cassandra) which is optimized for writes. If you need fast reads on write-heavy data, use a read replica with indexes — keep the primary write-optimized.
Not using covering indexes
Having an index on the WHERE columns but still doing millions of table lookups to fetch the SELECT columns. The index finds the row IDs fast, but reading each full row from the table is slow random I/O.
✅Add frequently selected columns to the index with INCLUDE. For listing pages that return name, price, image — include those in the index. Check EXPLAIN output: 'Index Scan' (table lookup needed) vs 'Index Only Scan' (covering, no table access).