GSILSISparse IndexInverted IndexGSI OverloadingProjections

Indexes (GSI & LSI)

The second most important design decision in DynamoDB. Indexes let you query data beyond the primary key — but every index has a cost.

40 min read9 sections
01

Why Indexes Exist in DynamoDB

Without indexes, DynamoDB can only serve two access patterns: GetItem (exact primary key lookup) and Query (items sharing a partition key, filtered by sort key). Every other access pattern requires either a Scan (reads entire table) or an index.

📚

The Library Card Catalog

Your base table is like a library organized by author (partition key) and publication date (sort key). If someone asks 'find all books about physics' — you'd have to check every shelf (Scan). A GSI is like adding a second card catalog organized by subject. Now 'physics' queries are instant. But maintaining two catalogs means every new book must be filed in both — that's the write amplification cost.

The Index Decision Framework

For every access pattern that cannot be served by the base table primary key: (1) Can you redesign the sort key? (2) If not, do you need a GSI or LSI? (3) What attributes should be projected? (4) Is the write amplification cost acceptable?

02

Local Secondary Index (LSI)

An LSI provides an alternative sort key on the same partition key. It shares the partition space with the base table — queries are still limited to one partition.

LSI Characteristics

  • Same partition key as base table, different sort key
  • Can use strongly consistent reads (shares capacity with base table)
  • Must be defined at table creation — cannot add later
  • Maximum 5 LSIs per table
  • Shares the 10 GB partition limit with base table items
  • Projected attributes: ALL, KEYS_ONLY, or specific attributes
lsi-example.txttext
Example: Orders table
  Base table:  PK = userId, SK = orderId
  LSI:         PK = userId, SK = orderDate

Base table query: "Get order ORD-123 for user U-456"
PK = U-456, SK = ORD-123

LSI query: "Get all orders for user U-456 in January 2024"
PK = U-456, SK BETWEEN '2024-01-01' AND '2024-01-31'
  (uses the LSI's orderDate sort key)

LSI Limitations

LSIs must be created at table creation time. If you realize you need one later, you must recreate the table. Also, LSIs share the 10 GB partition size limit — all base table items + LSI items for a partition key must fit in 10 GB.

03

Global Secondary Index (GSI)

A GSI has a completely different partition key and optional sort key from the base table. It creates an independent partition space, enabling queries across all items in the table.

FeatureLSIGSI
Partition keySame as base tableDifferent from base table
Sort keyAlternative sort keyOptional, any attribute
ConsistencyStrong or eventualEventually consistent ONLY
CapacityShares with base tableSeparate RCU/WCU provisioning
Creation timeTable creation onlyCan add/delete anytime
Limit5 per table20 per table (soft limit)
Partition limitShares 10 GB with baseIndependent partition space
Query scopeWithin one partition keyAcross entire table
gsi-example.txttext
Example: Find all orders by status (across all users)

Base table: PK = userId, SK = orderId
Cannot query by status without scanning entire table

GSI: PK = status, SK = createdAt
Query: PK = "PENDING", SK > "2024-01-01"
Returns all pending orders created after Jan 1, sorted by date

⚠️ "status" as GSI PK has low cardinalityconsider sparse index instead

GSI Features

  • Eventually consistent only — no strong consistency option on GSI
  • Separate capacity from base table — provision RCU/WCU independently
  • Can be added or deleted after table creation (unlike LSI)
  • Maximum 20 GSIs per table (soft limit, can request increase)
  • Projected attributes: ALL, KEYS_ONLY, or specific attributes
  • Sparse: only items with the GSI key attribute appear in the index
04

GSI as the Primary Access Pattern Tool

In practice, every access pattern beyond GetItem and base table Query needs either a sort key redesign or a GSI. GSIs are the primary tool for serving additional access patterns.

Write Amplification Cost

Every write to the base table also writes to each GSI that includes the item. If you have 3 GSIs, every PutItem results in 4 writes (1 base + 3 GSI). Factor this into WCU planning: effective WCU = base WCU × (1 + number of GSIs that include the item).

Projection Types

ProjectionWhat's StoredStorage CostWhen to Use
KEYS_ONLYBase table keys + GSI keysMinimalWhen you only need IDs, then fetch full item
INCLUDEKeys + specified attributesMediumWhen queries need specific fields
ALLAll attributes from base tableMaximumWhen queries need full item (avoid fetches back to base)

If a Query on a GSI needs an attribute that isn't projected, DynamoDB must fetch it from the base table — an additional read that costs RCU and adds latency. Project attributes you'll actually query.

05

Sparse Index Pattern

A GSI only indexes items that have the GSI key attribute. Items without that attribute simply don't appear in the index. This is the sparse index pattern — one of the most powerful DynamoDB techniques.

sparse-index.txttext
Problem: "Find all pending orders" without scanning millions of completed orders

Solution: Sparse GSI
Add attribute "pendingAt" ONLY to pending orders
GSI: PK = pendingAt (or a shard key), SK = orderId
When order completes: REMOVE pendingAt attribute
Completed orders fall out of the GSI automatically

Result:
GSI only contains pending orders (tiny subset)
Query is fast and cheapno scanning completed orders
No background job needed to clean up the index
🏷️

The Sticky Note Pattern

Think of the sparse index as a sticky note on items that need attention. Only items with the sticky note appear in the index. When you're done with an item, remove the sticky note — it disappears from the index. The index stays small and focused, containing only the items you actually need to query.

Common Sparse Index Use Cases

Sparse Index Use Cases

  • Pending/active items: only items needing processing have the index attribute
  • Flagged content: only flagged items appear in moderation queue index
  • Expiring items: only items with TTL attribute appear in expiry index
  • Featured items: only featured products appear in featured index
  • Error states: only failed jobs appear in retry queue index
06

Inverted Index Pattern

An inverted index is a GSI where the partition key and sort key are swapped from the base table. This enables querying relationships in the reverse direction.

inverted-index.txttext
Base table: PK = userId, SK = productId
Query: "What products did user 123 buy?"
PK = USER#123, SK begins_with "PRODUCT#"

Inverted GSI: PK = productId (base SK), SK = userId (base PK)
Query: "Which users bought product ABC?"
GSI PK = PRODUCT#ABC, SK begins_with "USER#"

One table, two query directions, no data duplication in application code.

When to Use Inverted Index

Use when your base table models a relationship in one direction (user → products) but you also need the reverse (product → users). The GSI automatically maintains the inverted view as items are written to the base table.

07

GSI Overloading

GSI overloading reuses the same GSI for multiple access patterns by storing different types of values in the GSI key attributes depending on the item type.

gsi-overloading.txttext
Single GSI serving multiple access patterns:

Base Table:
  PK          | SK              | GSI1PK        | GSI1SK
  USER#123    | USER#123        | user@email.com| USER#123
  USER#123    | ORDER#abc       | PENDING       | 2024-01-15
  PRODUCT#xyz | PRODUCT#xyz     | Electronics   | PRODUCT#xyz

GSI1 serves THREE different queries:
  1. "Find user by email"GSI1PK = "user@email.com"
  2. "Find pending orders"GSI1PK = "PENDING", GSI1SK > "2024-01"
  3. "Find products by category"GSI1PK = "Electronics"

One GSI, three access patterns. Maximum 20 GSIs means up to 60+ 
access patterns with overloading.

Complexity Trade-off

GSI overloading is powerful but makes the table harder to understand. The GSI key attributes have no consistent meaning — they hold emails for users, statuses for orders, and categories for products. Document your access patterns thoroughly.

08

Interview Questions

Q:What is the difference between a GSI and an LSI?

A: LSI: same partition key, different sort key, strongly consistent reads possible, must be created at table creation, shares 10 GB partition limit. GSI: completely different partition key, eventually consistent only, can be added anytime, independent partition space and capacity. Use LSI when you need an alternative sort within the same partition. Use GSI when you need to query across all items by a different attribute.

Q:What is a sparse index and when would you use it?

A: A GSI that only contains items with the GSI key attribute present. Items without the attribute don't appear in the index. Use case: 'find all pending orders' — only pending orders have a 'pendingAt' attribute, so the GSI only contains pending items. When an order completes, remove the attribute and it falls out of the index. The index stays small and queries are cheap.

Q:How does write amplification work with GSIs?

A: Every write to the base table is replicated to each GSI that includes the item. With 3 GSIs, a single PutItem results in 4 writes (1 base + 3 GSI). This means 4× WCU consumption. GSIs with KEYS_ONLY projection have lower write cost than ALL projection. Factor this into capacity planning.

Q:Why can't you use strongly consistent reads on a GSI?

A: GSIs are stored on separate partitions from the base table and replicated asynchronously. There's no leader-follower relationship between base table and GSI — the GSI is eventually consistent by design. If you need strong consistency, use an LSI (same partition) or read from the base table directly.

Q:What is GSI overloading?

A: Storing different types of values in the same GSI key attributes depending on item type. One GSI serves multiple access patterns: GSI1PK might hold an email (for user lookup), a status (for order filtering), or a category (for product browsing). Maximizes the 20-GSI limit by serving 3-4 patterns per GSI.

09

Common Mistakes

📊

Creating a GSI for every access pattern

Each GSI adds write amplification and cost. Before adding a GSI, check if you can serve the pattern with sort key redesign, composite sort keys, or GSI overloading.

Aim for 2-3 well-designed GSIs with overloading, not 10 single-purpose ones.

💾

Using ALL projection on every GSI

ALL projection copies every attribute to the GSI — maximum storage and write cost. If your GSI query only needs 3 attributes, use INCLUDE projection.

Use KEYS_ONLY if you just need IDs, INCLUDE for specific fields, ALL only when queries need the full item.

⏱️

Forgetting that GSIs are eventually consistent

Writing an item then immediately querying a GSI may not return the new item. If your application requires read-after-write consistency, this causes bugs.

Read from the base table with ConsistentRead=true, or use an LSI for strong consistency within a partition.

🔥

Low-cardinality GSI partition keys

Using 'status' (3 values) as a GSI partition key creates hot GSI partitions. All pending orders hit one partition.

Use composite GSI keys or write sharding: GSI PK = 'PENDING#' + random(1,10) to distribute load.

📉

Not monitoring GSI throttling separately

GSIs have their own capacity. If a GSI is under-provisioned, writes to the BASE TABLE get throttled because DynamoDB can't replicate to the GSI.

Monitor GSI consumed capacity independently. Set CloudWatch alarms on each GSI's ThrottledRequests metric.