DenormalizationMaterialized ViewsCQRSElasticsearchRead OptimizationInverted IndexPrecomputation

Query Optimization

Optimize read performance with denormalization, materialized views, CQRS pattern, and search indexes. Trade write complexity for blazing-fast reads at scale.

26 min read10 sections
01

The Big Picture — Why Reads Get Slow

Normalized databases are designed for correctness — no redundancy, no anomalies, clean relationships. But correctness has a cost: to display a product page, you JOIN products, categories, reviews, inventory, and images — 5 tables, every single page load. At 50,000 page views per second, those JOINs become the bottleneck.

🍳

The Pre-Cooked Meal Analogy

A restaurant can prepare every dish from scratch — chop vegetables, marinate meat, cook rice — for each order. That's a normalized database: correct, no waste, but slow. Or the kitchen can pre-cook popular meals during off-hours and keep them warm. When a customer orders, the waiter grabs a ready plate in 10 seconds instead of waiting 30 minutes. That's denormalization and precomputation: you do extra work during writes so reads are instant.

🔥 Key Insight

Query optimization for reads is about shifting work from read time to write time. Instead of computing results on every read (expensive JOINs, aggregations), you precompute and store the results. Reads become simple lookups. The trade-off: writes are more complex, and data might be briefly stale.

02

The Two Paths: Write vs Read

Write Path vs Read Pathtext
WRITE PATH (optimize for correctness):
  User creates orderNormalizeStore in tables
  orders(id, user_id, total)
  order_items(order_id, product_id, qty)
Clean, no redundancy, easy to update

READ PATH (optimize for speed):
  User views dashboardPrecomputed viewInstant response
  user_dashboard(user_id, order_count, total_spent, last_order_date)
Redundant, but no JOINs needed, 1ms lookup

The pattern:
  WriteNormalized tables (source of truth)
Trigger/eventUpdate denormalized views, indexes, caches
  ReadDenormalized view / materialized view / search index
Fast response (no JOINs, no aggregations)
🔗

Reduce JOINs

Embed related data directly in the row. Instead of joining users + orders + products, store user_name and product_name in the order row. One table scan, no JOINs.

📊

Precompute Results

Calculate aggregates during writes, not reads. When an order is placed, increment user.order_count. The dashboard reads one column instead of COUNT(*) across millions of rows.

🔍

Use Specialized Indexes

Full-text search doesn't belong in PostgreSQL. Use Elasticsearch with inverted indexes. Product filtering doesn't need 5 JOINs — use a denormalized search index.

🔄

Separate Read/Write Models

CQRS: writes go to a normalized database. An event stream updates a read-optimized store. Reads never touch the write database. Each is optimized for its purpose.

03

Denormalization for Reads

Denormalization stores redundant data to eliminate JOINs at read time. Instead of looking up the user's name from the users table every time you display an order, you store the name directly in the orders table.

Normalized vs Denormalizedsql
-- NORMALIZED (correct, slow reads):
-- To display "Alice ordered Laptop":
SELECT u.name, p.name, o.total
FROM orders o
JOIN users u ON u.id = o.user_id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
WHERE o.id = 101;
-- 3 JOINs, touches 4 tables, ~20ms

-- DENORMALIZED (redundant, fast reads):
-- Store user_name and product_name directly in orders
SELECT user_name, product_name, total
FROM orders_denormalized
WHERE id = 101;
-- 0 JOINs, 1 table, ~1ms

-- The trade-off: when Alice changes her name,
-- you must update it in the users table AND in every order row.
-- Write complexity increases, but reads are 20x faster.

Real-World Examples

📱 Social Media Feed

Each feed item stores: author_name, author_avatar, post_text, like_count, comment_count — all embedded. No JOINs to render a feed. When a user changes their avatar, a background job updates all their feed items.

🛒 Product Listing

Each product row stores: category_name, avg_rating, review_count, seller_name — all embedded. The listing page reads one table. When a new review is posted, update avg_rating and review_count on the product row.

Strengths

  • Dramatically faster reads (no JOINs)
  • Simpler read queries (single table)
  • Predictable read latency (no query plan surprises)
  • Works well with caching (simple key-value shape)
  • Standard practice in read-heavy systems

Costs

  • Data duplication (same data in multiple places)
  • Complex writes (must update all copies on change)
  • Risk of inconsistency (if an update is missed)
  • More storage (redundant columns)
  • Harder to change schema (data spread across tables)

🎯 Interview Insight

Denormalization is the most common read optimization in large-scale systems. In interviews, say: "I'd normalize the write path for correctness, then denormalize the read path for performance. When a user updates their name, a background job propagates the change to all denormalized copies."

04

Materialized Views

A materialized view is a precomputed query result stored as a table. Instead of running an expensive aggregation on every read, you run it once and store the result. Reads fetch the stored result instantly.

Materialized View — How It Workstext
Base Tables:
  orders (200M rows)
  order_items (1B rows)
  products (5M rows)

Expensive query (runs on every dashboard load):
  SELECT category, SUM(total) as revenue, COUNT(*) as order_count
  FROM orders o
  JOIN order_items oi ON oi.order_id = o.id
  JOIN products p ON p.id = oi.product_id
  GROUP BY category;
Scans 1.2B rows, takes 30 seconds

Materialized view:
  CREATE MATERIALIZED VIEW category_revenue AS
  SELECT category, SUM(total) as revenue, COUNT(*) as order_count
  FROM orders o
  JOIN order_items oi ON oi.order_id = o.id
  JOIN products p ON p.id = oi.product_id
  GROUP BY category;

Dashboard query:
  SELECT * FROM category_revenue;
Reads precomputed table, takes 2ms

Refresh:
  REFRESH MATERIALIZED VIEW category_revenue;
Re-runs the expensive query, updates the stored result
Schedule: every 5 minutes, or trigger on write

Refresh Strategies

StrategyFreshnessCostBest For
On every writeAlways freshHigh (refresh on each INSERT/UPDATE)Low-write, high-read data
Scheduled (every N minutes)Stale up to N minutesLow (periodic batch)Dashboards, reports, analytics
Incremental refreshNear real-timeMedium (only process new data)Large datasets with append-only writes
Manual / on-demandStale until triggeredLowestAd-hoc reports, monthly summaries

Strengths

  • Turns 30-second queries into 2ms lookups
  • No application code changes (it's a database feature)
  • Perfect for predictable, expensive queries
  • Supported by PostgreSQL, Oracle, SQL Server

Limitations

  • Data is stale between refreshes
  • Refresh itself is expensive (re-runs the full query)
  • Storage overhead (stores the full result set)
  • Not suitable for user-specific or highly dynamic data

🎯 Interview Insight

Materialized views are ideal when queries are expensive but predictable — dashboards, leaderboards, analytics summaries. Say: "The admin dashboard runs a 30-second aggregation. I'd create a materialized view refreshed every 5 minutes. Dashboard reads are instant, and 5-minute staleness is acceptable for analytics."

05

CQRS Pattern

CQRS (Command Query Responsibility Segregation) takes denormalization to its logical extreme: completely separate systems for writes and reads. The write database is normalized for correctness. The read database is denormalized for speed. An event stream keeps them in sync.

CQRS — How It Workstext
                    WRITE PATH                    READ PATH
                    ─────────                    ─────────
  User places order                    User views order history
        │                                       │
        ▼                                       ▼
  ┌──────────────┐                    ┌──────────────────┐
Write DB    │                    │   Read DB
  │ (normalized) │                    │ (denormalized)   │
  │              │                    │                  │
orders       │  ──── Event ────→  │ order_history
order_itemsStream         │ (user_name,      │
products     │  (Kafka/SQS)       │  product_name,   │
users        │                    │  total, status)  │
  └──────────────┘                    └──────────────────┘

Write: INSERT into normalized tablespublish event
Event consumer: reads eventupdates denormalized read model
Read: SELECT from denormalized read modelinstant response

Write DB: PostgreSQL (ACID, normalized, source of truth)
Read DB: MongoDB / Elasticsearch / Redis (optimized for queries)

Strengths

  • Read and write models independently optimized
  • Read DB can use a completely different technology
  • Scales reads and writes independently
  • Write DB stays clean and normalized
  • Read DB can have multiple projections for different queries

Costs

  • Significant complexity (two databases, event stream, consumers)
  • Eventual consistency (read model lags behind writes)
  • More infrastructure to operate and monitor
  • Debugging is harder (data flows through multiple systems)
  • Overkill for simple CRUD applications

🎯 Interview Insight

CQRS is a powerful pattern but don't reach for it by default. Use it when: (1) read and write patterns are fundamentally different, (2) you need different data models for reads vs writes, (3) you need to scale reads and writes independently. For a simple CRUD app, CQRS is over-engineering. For a social media feed with 100:1 read-to-write ratio, it's the right call.

06

Search Indexes (Elasticsearch)

Traditional databases use B-tree indexes — great for exact matches and ranges, terrible for full-text search. Searching "comfortable running shoes" across 5 million products with SQL LIKE is painfully slow. Elasticsearch uses inverted indexes — the same structure that powers Google Search.

Inverted Index — How It Workstext
Documents:
  doc1: "comfortable running shoes for men"
  doc2: "lightweight running sneakers"
  doc3: "comfortable leather shoes"

Inverted Index (built during indexing):
  comfortable → [doc1, doc3]
  running     → [doc1, doc2]
  shoes       → [doc1, doc3]
  men         → [doc1]
  lightweight → [doc2]
  sneakers    → [doc2]
  leather     → [doc3]

Search: "comfortable running shoes"
  comfortable → [doc1, doc3]
  running     → [doc1, doc2]
  shoes       → [doc1, doc3]
  Intersection: doc1 (matches all 3 terms) → highest relevance

SQL equivalent:
  WHERE name LIKE '%comfortable%' AND name LIKE '%running%'
Full table scan on 5M rows, no index used, 10+ seconds

Elasticsearch:
Inverted index lookup, 3 set intersections, 5ms
🛒

Product Search

'Red Nike shoes size 10' — filters by color, brand, category, size with full-text matching on the name. Elasticsearch handles this in milliseconds across millions of products.

📋

Log Search

'Find all ERROR logs from payment-service in the last hour.' Elasticsearch indexes log lines and enables instant search across billions of log entries.

📰

Content Discovery

'Articles about machine learning for beginners.' Full-text search with relevance scoring, synonyms, and fuzzy matching. SQL can't do this efficiently.

Elasticsearch in Architecturetext
Architecture:
  PostgreSQL (source of truth) → Change Data CaptureElasticsearch

Write path:
  1. Product created/updated in PostgreSQL
  2. CDC (Debezium) captures the change
  3. Change event sent to Kafka
  4. Consumer indexes the product in Elasticsearch

Read path (search):
  1. User searches "comfortable running shoes"
  2. App queries Elasticsearch (not PostgreSQL)
  3. Elasticsearch returns matching product IDs + relevance scores
  4. App fetches full product details from PostgreSQL/cache

Key insight:
  PostgreSQL = source of truth (writes, transactions, consistency)
  Elasticsearch = search index (reads, full-text, relevance)
  They serve different purposes. Don't replace one with the other.

Strengths

  • Full-text search with relevance scoring
  • Inverted indexes: millisecond search across millions of docs
  • Advanced features: fuzzy matching, synonyms, autocomplete
  • Faceted search (filter by category, price range, brand)
  • Horizontal scaling (distributed across nodes)

Costs

  • Extra system to maintain (cluster, monitoring, upgrades)
  • Data sync: must keep ES in sync with primary DB
  • Not a database: no ACID, no transactions, no JOINs
  • Memory-hungry (inverted indexes live in RAM)
  • Eventual consistency with the source of truth

🎯 Interview Insight

When a system needs search — product search, log search, content discovery — always mention Elasticsearch. SQL LIKE is not a search solution at scale. Say: "I'd use PostgreSQL as the source of truth and sync to Elasticsearch via CDC for search queries. This gives us ACID for writes and millisecond full-text search for reads."

07

End-to-End Scenario

Let's design the read optimization layer for an e-commerce platform — using all four techniques together.

🛒 E-Commerce — 100K Product Views/sec

5M products, 200M orders, 50M users. Read:Write = 100:1.

Pages: product listing, product detail, search, admin dashboard.

1

Product Listing → Denormalization

The listing page shows: name, price, image, avg_rating, review_count, category_name. Instead of JOINing 4 tables, store all fields in a denormalized products_listing table. One table scan, no JOINs. When a review is posted, a background job updates avg_rating and review_count.

2

Admin Dashboard → Materialized Views

Dashboard shows: revenue by category, top sellers, order trends. These aggregations scan 200M+ rows. Create materialized views refreshed every 5 minutes. Dashboard reads precomputed results in 2ms instead of running 30-second queries.

3

Product Search → Elasticsearch

'Red Nike shoes size 10 under $100' — full-text search with filters. PostgreSQL can't do this efficiently. Sync products to Elasticsearch via CDC. Search queries hit ES (5ms), product detail pages hit PostgreSQL/cache.

4

Order History → CQRS

Write path: orders are created in normalized PostgreSQL (ACID for payments). Read path: an event consumer builds a denormalized order_history in MongoDB, optimized for 'show me my orders with product names and images.' Reads never touch the transactional database.

Architecture — All Techniques Combinedtext
WRITE PATH:
  Order placedPostgreSQL (normalized, ACID)
Kafka eventUpdate denormalized views
Kafka eventUpdate Elasticsearch index
Kafka eventUpdate MongoDB read model

READ PATH:
  Product listingDenormalized table (PostgreSQL) → Redis cache
  Product searchElasticsearch (inverted index)
  Admin dashboardMaterialized views (PostgreSQL)
  Order historyMongoDB read model (CQRS)
  Product detailRedis cachePostgreSQL fallback

Result:
  Write DB (PostgreSQL): handles 1K writes/sec (orders, updates)
  Read layer: handles 100K reads/sec across specialized stores
  Each read path is optimized for its specific query pattern
08

Trade-offs & Decision Making

TechniqueRead SpeedWrite ComplexityConsistencyBest For
DenormalizationVery fast (no JOINs)Medium (update copies)EventualListings, feeds, embedded data
Materialized ViewsInstant (precomputed)Low (DB handles refresh)Stale between refreshesDashboards, reports, aggregations
CQRSOptimized per queryHigh (event stream, consumers)EventualComplex systems, different read/write models
ElasticsearchMillisecond searchMedium (sync via CDC)EventualFull-text search, filtering, discovery

When to Use What

ScenarioTechniqueWhy
Product listing with ratingsDenormalizationEmbed avg_rating in product row, avoid JOIN on every listing
Revenue dashboardMaterialized ViewExpensive aggregation, predictable query, 5-min staleness OK
Order history (100:1 read:write)CQRSWrite model (ACID) and read model (fast) need different shapes
Product searchElasticsearchFull-text + filters, SQL can't do this at scale
Simple CRUD appNone of the aboveNormalized DB + indexes is sufficient. Don't over-engineer.

🎯 Decision Framework

Start with a normalized database + proper indexes. When specific queries become slow, apply the right technique: JOINs too slow → denormalize. Aggregations too slow → materialized views. Search too slow → Elasticsearch. Read/write models diverging → CQRS. Don't apply all four on day one — add complexity only when the simpler approach fails.

09

Interview Questions

Q:Why denormalize data?

A: To eliminate JOINs at read time. In a normalized schema, displaying a product listing requires joining products, categories, reviews, and inventory — 4 tables on every page load. Denormalization embeds category_name, avg_rating, and stock_count directly in the product row. Reads become single-table lookups. The trade-off: when a category name changes, you must update it in every product row. This is acceptable when reads outnumber writes 100:1.

Q:What is a materialized view?

A: A precomputed query result stored as a table. Instead of running an expensive aggregation (SUM, COUNT, GROUP BY across millions of rows) on every dashboard load, you run it once and store the result. Dashboard reads fetch the stored result in milliseconds. The view is refreshed periodically (every 5 minutes) or on demand. Trade-off: data is stale between refreshes. Perfect for analytics dashboards, leaderboards, and reporting.

Q:When would you use CQRS?

A: When read and write patterns are fundamentally different. Example: an e-commerce system where writes are normalized (orders, items, payments — ACID transactions) but reads need a completely different shape (order history with product names, images, tracking info — denormalized). CQRS separates them: writes go to PostgreSQL, events flow to a read-optimized MongoDB. Each database is optimized for its purpose. Don't use CQRS for simple CRUD — the complexity isn't justified.

Q:Why use Elasticsearch over SQL for search?

A: SQL databases use B-tree indexes — great for exact matches (WHERE id = 42) and ranges (WHERE price > 100), terrible for full-text search. 'Comfortable running shoes' with SQL LIKE '%comfortable%' requires a full table scan — no index helps. Elasticsearch uses inverted indexes: each word maps to the documents containing it. Searching 3 words = 3 hash lookups + set intersection = 5ms across millions of documents. Plus: relevance scoring, fuzzy matching, synonyms, autocomplete — none of which SQL supports natively.

10

Pitfalls

📦

Over-denormalizing

Embedding everything everywhere. User name stored in orders, reviews, comments, messages, notifications — 5 copies. When the user changes their name, you must update all 5 tables. Miss one and the data is inconsistent. The more copies, the harder it is to keep them in sync.

Denormalize only the fields that are read frequently and change rarely. User name in orders? Yes (orders are read often, names change rarely). Real-time stock count in product listing? No (changes every second — use a short TTL cache instead).

Ignoring data consistency in views

Creating a materialized view refreshed every hour for a dashboard that shows 'current inventory.' Users see inventory from an hour ago and make decisions on stale data. Or: CQRS read model is 30 seconds behind, and a user doesn't see their own order.

Match refresh frequency to the data's staleness tolerance. Inventory: refresh every 10 seconds or use a live query. Analytics: every 5 minutes is fine. For CQRS: implement read-your-own-writes — after a write, route that user's reads to the write DB briefly.

🔧

Overengineering with CQRS

Using CQRS for a simple blog or CRUD admin panel. Two databases, an event stream, event consumers, eventual consistency — all for a system that handles 100 reads/sec. The complexity far outweighs the benefit.

CQRS is for systems where read and write patterns are fundamentally different AND scale is significant. If a normalized PostgreSQL with proper indexes handles your reads fine, you don't need CQRS. Start simple, add complexity when the simple approach fails.

🔍

Not keeping Elasticsearch in sync

Indexing products in Elasticsearch on day one, then forgetting to update the index when products change. Search results show deleted products, wrong prices, and outdated descriptions. Users lose trust in search.

Use Change Data Capture (CDC) with Debezium or a similar tool. Every change in PostgreSQL automatically flows to Elasticsearch via Kafka. No manual sync, no missed updates. Monitor the sync lag — if ES falls behind, search results are stale.