Indexes (GSI & LSI)
The second most important design decision in DynamoDB. Indexes let you query data beyond the primary key — but every index has a cost.
Table of Contents
Why Indexes Exist in DynamoDB
Without indexes, DynamoDB can only serve two access patterns: GetItem (exact primary key lookup) and Query (items sharing a partition key, filtered by sort key). Every other access pattern requires either a Scan (reads entire table) or an index.
The Library Card Catalog
Your base table is like a library organized by author (partition key) and publication date (sort key). If someone asks 'find all books about physics' — you'd have to check every shelf (Scan). A GSI is like adding a second card catalog organized by subject. Now 'physics' queries are instant. But maintaining two catalogs means every new book must be filed in both — that's the write amplification cost.
The Index Decision Framework
For every access pattern that cannot be served by the base table primary key: (1) Can you redesign the sort key? (2) If not, do you need a GSI or LSI? (3) What attributes should be projected? (4) Is the write amplification cost acceptable?
Local Secondary Index (LSI)
An LSI provides an alternative sort key on the same partition key. It shares the partition space with the base table — queries are still limited to one partition.
LSI Characteristics
- ✅Same partition key as base table, different sort key
- ✅Can use strongly consistent reads (shares capacity with base table)
- ✅Must be defined at table creation — cannot add later
- ✅Maximum 5 LSIs per table
- ✅Shares the 10 GB partition limit with base table items
- ✅Projected attributes: ALL, KEYS_ONLY, or specific attributes
Example: Orders table Base table: PK = userId, SK = orderId LSI: PK = userId, SK = orderDate Base table query: "Get order ORD-123 for user U-456" → PK = U-456, SK = ORD-123 LSI query: "Get all orders for user U-456 in January 2024" → PK = U-456, SK BETWEEN '2024-01-01' AND '2024-01-31' (uses the LSI's orderDate sort key)
LSI Limitations
LSIs must be created at table creation time. If you realize you need one later, you must recreate the table. Also, LSIs share the 10 GB partition size limit — all base table items + LSI items for a partition key must fit in 10 GB.
Global Secondary Index (GSI)
A GSI has a completely different partition key and optional sort key from the base table. It creates an independent partition space, enabling queries across all items in the table.
| Feature | LSI | GSI |
|---|---|---|
| Partition key | Same as base table | Different from base table |
| Sort key | Alternative sort key | Optional, any attribute |
| Consistency | Strong or eventual | Eventually consistent ONLY |
| Capacity | Shares with base table | Separate RCU/WCU provisioning |
| Creation time | Table creation only | Can add/delete anytime |
| Limit | 5 per table | 20 per table (soft limit) |
| Partition limit | Shares 10 GB with base | Independent partition space |
| Query scope | Within one partition key | Across entire table |
Example: Find all orders by status (across all users) Base table: PK = userId, SK = orderId → Cannot query by status without scanning entire table GSI: PK = status, SK = createdAt → Query: PK = "PENDING", SK > "2024-01-01" → Returns all pending orders created after Jan 1, sorted by date ⚠️ "status" as GSI PK has low cardinality — consider sparse index instead
GSI Features
- ✅Eventually consistent only — no strong consistency option on GSI
- ✅Separate capacity from base table — provision RCU/WCU independently
- ✅Can be added or deleted after table creation (unlike LSI)
- ✅Maximum 20 GSIs per table (soft limit, can request increase)
- ✅Projected attributes: ALL, KEYS_ONLY, or specific attributes
- ✅Sparse: only items with the GSI key attribute appear in the index
GSI as the Primary Access Pattern Tool
In practice, every access pattern beyond GetItem and base table Query needs either a sort key redesign or a GSI. GSIs are the primary tool for serving additional access patterns.
Write Amplification Cost
Every write to the base table also writes to each GSI that includes the item. If you have 3 GSIs, every PutItem results in 4 writes (1 base + 3 GSI). Factor this into WCU planning: effective WCU = base WCU × (1 + number of GSIs that include the item).
Projection Types
| Projection | What's Stored | Storage Cost | When to Use |
|---|---|---|---|
| KEYS_ONLY | Base table keys + GSI keys | Minimal | When you only need IDs, then fetch full item |
| INCLUDE | Keys + specified attributes | Medium | When queries need specific fields |
| ALL | All attributes from base table | Maximum | When queries need full item (avoid fetches back to base) |
If a Query on a GSI needs an attribute that isn't projected, DynamoDB must fetch it from the base table — an additional read that costs RCU and adds latency. Project attributes you'll actually query.
Sparse Index Pattern
A GSI only indexes items that have the GSI key attribute. Items without that attribute simply don't appear in the index. This is the sparse index pattern — one of the most powerful DynamoDB techniques.
Problem: "Find all pending orders" without scanning millions of completed orders Solution: Sparse GSI • Add attribute "pendingAt" ONLY to pending orders • GSI: PK = pendingAt (or a shard key), SK = orderId • When order completes: REMOVE pendingAt attribute • Completed orders fall out of the GSI automatically Result: • GSI only contains pending orders (tiny subset) • Query is fast and cheap — no scanning completed orders • No background job needed to clean up the index
The Sticky Note Pattern
Think of the sparse index as a sticky note on items that need attention. Only items with the sticky note appear in the index. When you're done with an item, remove the sticky note — it disappears from the index. The index stays small and focused, containing only the items you actually need to query.
Common Sparse Index Use Cases
Sparse Index Use Cases
- ✅Pending/active items: only items needing processing have the index attribute
- ✅Flagged content: only flagged items appear in moderation queue index
- ✅Expiring items: only items with TTL attribute appear in expiry index
- ✅Featured items: only featured products appear in featured index
- ✅Error states: only failed jobs appear in retry queue index
Inverted Index Pattern
An inverted index is a GSI where the partition key and sort key are swapped from the base table. This enables querying relationships in the reverse direction.
Base table: PK = userId, SK = productId → Query: "What products did user 123 buy?" → PK = USER#123, SK begins_with "PRODUCT#" Inverted GSI: PK = productId (base SK), SK = userId (base PK) → Query: "Which users bought product ABC?" → GSI PK = PRODUCT#ABC, SK begins_with "USER#" One table, two query directions, no data duplication in application code.
When to Use Inverted Index
Use when your base table models a relationship in one direction (user → products) but you also need the reverse (product → users). The GSI automatically maintains the inverted view as items are written to the base table.
GSI Overloading
GSI overloading reuses the same GSI for multiple access patterns by storing different types of values in the GSI key attributes depending on the item type.
Single GSI serving multiple access patterns: Base Table: PK | SK | GSI1PK | GSI1SK USER#123 | USER#123 | user@email.com| USER#123 USER#123 | ORDER#abc | PENDING | 2024-01-15 PRODUCT#xyz | PRODUCT#xyz | Electronics | PRODUCT#xyz GSI1 serves THREE different queries: 1. "Find user by email" → GSI1PK = "user@email.com" 2. "Find pending orders" → GSI1PK = "PENDING", GSI1SK > "2024-01" 3. "Find products by category" → GSI1PK = "Electronics" One GSI, three access patterns. Maximum 20 GSIs means up to 60+ access patterns with overloading.
Complexity Trade-off
GSI overloading is powerful but makes the table harder to understand. The GSI key attributes have no consistent meaning — they hold emails for users, statuses for orders, and categories for products. Document your access patterns thoroughly.
Interview Questions
Q:What is the difference between a GSI and an LSI?
A: LSI: same partition key, different sort key, strongly consistent reads possible, must be created at table creation, shares 10 GB partition limit. GSI: completely different partition key, eventually consistent only, can be added anytime, independent partition space and capacity. Use LSI when you need an alternative sort within the same partition. Use GSI when you need to query across all items by a different attribute.
Q:What is a sparse index and when would you use it?
A: A GSI that only contains items with the GSI key attribute present. Items without the attribute don't appear in the index. Use case: 'find all pending orders' — only pending orders have a 'pendingAt' attribute, so the GSI only contains pending items. When an order completes, remove the attribute and it falls out of the index. The index stays small and queries are cheap.
Q:How does write amplification work with GSIs?
A: Every write to the base table is replicated to each GSI that includes the item. With 3 GSIs, a single PutItem results in 4 writes (1 base + 3 GSI). This means 4× WCU consumption. GSIs with KEYS_ONLY projection have lower write cost than ALL projection. Factor this into capacity planning.
Q:Why can't you use strongly consistent reads on a GSI?
A: GSIs are stored on separate partitions from the base table and replicated asynchronously. There's no leader-follower relationship between base table and GSI — the GSI is eventually consistent by design. If you need strong consistency, use an LSI (same partition) or read from the base table directly.
Q:What is GSI overloading?
A: Storing different types of values in the same GSI key attributes depending on item type. One GSI serves multiple access patterns: GSI1PK might hold an email (for user lookup), a status (for order filtering), or a category (for product browsing). Maximizes the 20-GSI limit by serving 3-4 patterns per GSI.
Common Mistakes
Creating a GSI for every access pattern
Each GSI adds write amplification and cost. Before adding a GSI, check if you can serve the pattern with sort key redesign, composite sort keys, or GSI overloading.
✅Aim for 2-3 well-designed GSIs with overloading, not 10 single-purpose ones.
Using ALL projection on every GSI
ALL projection copies every attribute to the GSI — maximum storage and write cost. If your GSI query only needs 3 attributes, use INCLUDE projection.
✅Use KEYS_ONLY if you just need IDs, INCLUDE for specific fields, ALL only when queries need the full item.
Forgetting that GSIs are eventually consistent
Writing an item then immediately querying a GSI may not return the new item. If your application requires read-after-write consistency, this causes bugs.
✅Read from the base table with ConsistentRead=true, or use an LSI for strong consistency within a partition.
Low-cardinality GSI partition keys
Using 'status' (3 values) as a GSI partition key creates hot GSI partitions. All pending orders hit one partition.
✅Use composite GSI keys or write sharding: GSI PK = 'PENDING#' + random(1,10) to distribute load.
Not monitoring GSI throttling separately
GSIs have their own capacity. If a GSI is under-provisioned, writes to the BASE TABLE get throttled because DynamoDB can't replicate to the GSI.
✅Monitor GSI consumed capacity independently. Set CloudWatch alarms on each GSI's ThrottledRequests metric.