Primary Key & Single-Table Design
The most important skill in DynamoDB. Every other decision ā indexes, capacity, performance ā flows from your primary key design.
Table of Contents
Access Pattern Analysis
In relational databases, you model entities first and figure out queries later. In DynamoDB, you do the opposite: list every query your application needs first, then design your table to serve them.
š The Required First Step
Before creating a DynamoDB table, write down every access pattern. For each: what are the inputs? What is returned? How frequently is it called? If a pattern cannot be served efficiently by your key design, you must either add a GSI or reconsider the model.
Example: E-Commerce Application
Access Pattern Analysis: āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā Pattern | Input | Frequency āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā Get user profile | userId | High Get all orders for a user | userId | High Get specific order | orderId | Medium Get orders by date range | userId+dates | Medium Get all items in an order | orderId | Medium Get user's recent 10 orders | userId | High Find orders by status (pending) | status | Low (admin) āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā Key Design Decision: PK = USER#<userId> SK = USER#<userId> ā user profile SK = ORDER#<timestamp>#<orderId> ā orders (sorted by time) GSI1: PK = ORDER#<orderId>, SK = ITEM#<itemId> ā order items
The Architect's Blueprint
You wouldn't build a house and then decide where the plumbing goes. In DynamoDB, your access patterns are the blueprint. The table design is the house built to serve those patterns. Changing the blueprint after construction means demolishing and rebuilding ā that's why access pattern analysis comes first, always.
Choosing a Partition Key
The partition key determines how data is distributed across physical partitions. A good partition key has three properties: high cardinality, even access distribution, and natural alignment with your access patterns.
| Partition Key | Cardinality | Distribution | Verdict |
|---|---|---|---|
| userId | High (millions) | Even (each user ~equal traffic) | ā Excellent |
| orderId | High (millions) | Even (each order accessed once) | ā Excellent |
| deviceId | High (thousands+) | Even (IoT sensors) | ā Good |
| status | Low (3-5 values) | Skewed (most items are 'active') | ā Terrible |
| date | Medium (365/year) | Skewed (today is hot) | ā Bad |
| country | Low (200) | Skewed (US gets 80% traffic) | ā Bad |
| boolean (isActive) | 2 values | Extremely skewed | ā Never use |
Good Partition Key Properties
- ā High cardinality ā many distinct values distributes load evenly
- ā Even access distribution ā not all requests going to same value
- ā Natural to your access patterns ā not forced or artificial
- ā Stable ā doesn't change over time for the same entity
- ā Available at query time ā you must know the PK to Query
Choosing a Sort Key
The sort key enables range queries within a partition. Items sharing a partition key are stored together, sorted by sort key value. This is where DynamoDB's query power lives.
Sort Key Conditions Available in Query
| Operator | Use Case | Example |
|---|---|---|
| = | Exact match | SK = 'PROFILE' |
| <, <=, >, >= | Range queries | SK > '2024-01-01' |
| BETWEEN | Date/time ranges | SK BETWEEN '2024-01' AND '2024-06' |
| begins_with | Prefix matching (hierarchical) | SK begins_with 'ORDER#' |
Composite Sort Keys for Hierarchical Queries
Sort Key: COUNTRY#US#STATE#CA#CITY#SF Query all US: SK begins_with "COUNTRY#US" Query California: SK begins_with "COUNTRY#US#STATE#CA" Query SF: SK begins_with "COUNTRY#US#STATE#CA#CITY#SF" One sort key design serves multiple levels of hierarchy.
Sort Key as Version
Use sort key for versioning: SK = v0000001, v0000002, etc. Query all versions with begins_with, get latest with ScanIndexForward=false and Limit=1. Or maintain a separate item with SK=LATEST that duplicates the current version.
Single-Table Design Philosophy
Single-table design stores multiple entity types in one table. Users, orders, products, and reviews all live together, differentiated by key prefixes and an entity type attribute.
Why Single-Table?
DynamoDB has no joins. If you need data from two tables, that means two network round trips. Single-table design co-locates related items in the same partition, enabling a single Query to fetch a user AND their orders in one request.
Single-Table Design Example: āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā PK | SK | Type | Data āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā USER#123 | USER#123 | User | {name, email, ...} USER#123 | ORDER#2024-01-15#abc | Order | {total, status, ...} USER#123 | ORDER#2024-01-20#def | Order | {total, status, ...} USER#123 | FOLLOWER#456 | Follow | {followedAt, ...} USER#456 | USER#456 | User | {name, email, ...} USER#456 | FOLLOWED_BY#123 | Follow | {followedAt, ...} ORDER#abc | ITEM#001 | Item | {product, qty, ...} ORDER#abc | ITEM#002 | Item | {product, qty, ...} āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā Queries enabled: ⢠Get user profile: PK = USER#123, SK = USER#123 ⢠Get user's orders: PK = USER#123, SK begins_with "ORDER#" ⢠Get recent orders: PK = USER#123, SK begins_with "ORDER#", ScanIndexForward=false, Limit=5 ⢠Get order items: PK = ORDER#abc, SK begins_with "ITEM#" ⢠Get user's followers: PK = USER#123, SK begins_with "FOLLOWER#"
Item Collections
All items sharing a partition key form an item collection. They are physically co-located on the same partition, which means querying them is fast ā a single sequential read from disk. This is the performance advantage of single-table design.
Key Overloading & Naming
In single-table design, the same key attributes hold different types of values depending on the item type. This is called key overloading.
| Approach | Key Names | Pros | Cons |
|---|---|---|---|
| Generic | PK, SK | Flexible for single-table | Less readable in console |
| Semantic | userId, createdAt | Clear intent | Less flexible for multi-entity |
| Prefixed values | USER#123, ORDER#abc | Namespace items, enable begins_with | Slightly more storage |
Key Overloading Example: Same PK attribute holds different entity references: PK = USER#123, SK = USER#123 ā User profile PK = USER#123, SK = ORDER#timestamp ā User's order PK = ORDER#abc, SK = ORDER#abc ā Order details PK = ORDER#abc, SK = ITEM#001 ā Order line item The PK and SK columns are generic ā their meaning depends on the item type. An "entityType" attribute disambiguates.
Why Prefixes?
Prefixes like USER#, ORDER#, ITEM# serve two purposes: (1) they namespace items so different entity types don't collide, and (2) they enable begins_with queries to fetch only items of a specific type within a partition.
Advanced Key Patterns
Adjacency List Pattern (Many-to-Many)
Modeling many-to-many relationships: PK = USER#123, SK = USER#123 ā user item PK = USER#123, SK = FOLLOWS#456 ā user 123 follows user 456 PK = USER#456, SK = FOLLOWED_BY#123 ā inverted relationship Query who user follows: PK = USER#123, SK begins_with "FOLLOWS#" Query followers of user: PK = USER#456, SK begins_with "FOLLOWED_BY#" Both directions queryable without a join.
Hierarchical Data Pattern
Parent-child relationships in one partition: PK = ORG#acme, SK = ORG#acme ā org item PK = ORG#acme, SK = DEPT#engineering ā department PK = ORG#acme, SK = DEPT#engineering#USER#123 ā user in dept Query entire org: PK = ORG#acme Query specific department: PK = ORG#acme, SK begins_with "DEPT#engineering" Query users in department: PK = ORG#acme, SK begins_with "DEPT#engineering#USER#"
Time-Series Pattern
Append-only writes, queries by time range: PK = DEVICE#sensor1, SK = 2024-01-15T10:30:00Z PK = DEVICE#sensor1, SK = 2024-01-15T10:31:00Z Query last hour: PK = DEVICE#sensor1 AND SK BETWEEN '2024-01-15T09:30' AND '2024-01-15T10:30' ā ļø Hot partition risk: all writes go to current time. Solution: shard by time bucket or add random suffix.
Optimistic Locking Pattern
// Add version attribute to items // UpdateItem condition: version must match expected const params = { TableName: "Products", Key: { PK: "PRODUCT#123", SK: "PRODUCT#123" }, UpdateExpression: "SET #name = :name, #version = #version + :one", ConditionExpression: "#version = :expectedVersion", ExpressionAttributeNames: { "#name": "name", "#version": "version" }, ExpressionAttributeValues: { ":name": "Updated Name", ":one": 1, ":expectedVersion": 5 // current known version } }; // ConditionalCheckFailedException ā retry with fresh read
When Single-Table is Wrong
Single-table design is powerful but not always appropriate. It adds complexity that may not be justified for every use case.
| Factor | Single-Table | Multi-Table |
|---|---|---|
| Team experience | Requires DynamoDB expertise | Easier to understand |
| Access patterns | Must be known upfront | More flexible to evolve |
| Query complexity | One request for related data | Multiple requests needed |
| Analytics/reporting | Very difficult | Easier with separate tables |
| Migration | All-or-nothing schema changes | Independent table evolution |
| Debugging | Complex ā items look different | Clear ā each table is one entity |
The Honest Assessment
Single-table design is the "expert mode" of DynamoDB. If your team is new to DynamoDB, start with one table per entity type. You can always consolidate later once access patterns stabilize. A poorly designed single table is worse than multiple well-designed tables.
Interview Questions
Q:What is single-table design and why would you use it?
A: Storing multiple entity types (users, orders, products) in one DynamoDB table using generic key names (PK, SK) with prefixed values. The benefit: related items share a partition key, enabling a single Query to fetch multiple entity types without joins (which DynamoDB doesn't support). Trade-off: increased complexity and harder ad-hoc querying.
Q:How do you model a many-to-many relationship in DynamoDB?
A: Use the adjacency list pattern. Store both directions of the relationship as separate items. PK=USER#123, SK=FOLLOWS#456 and PK=USER#456, SK=FOLLOWED_BY#123. Each direction is independently queryable with begins_with. For the reverse lookup without duplicating data, use a GSI with inverted PK/SK.
Q:What makes a good partition key?
A: Three properties: (1) High cardinality ā many distinct values. (2) Even access distribution ā no single value gets disproportionate traffic. (3) Natural to access patterns ā you must know the PK to query. Examples: userId, orderId, deviceId. Anti-examples: status, date, country.
Q:How would you handle a hot partition key?
A: Write sharding: append a random suffix (1-N) to the partition key to spread writes across N partitions. Reads require scatter-gather across all N shards. Calculated sharding: suffix = hash(userId) % N for deterministic routing. Also consider: caching hot reads with DAX, or switching to on-demand capacity mode.
Q:What is the difference between begins_with and contains in DynamoDB?
A: begins_with is a key condition ā it can be used in KeyConditionExpression and leverages the sort key index for efficient queries. contains is a filter expression ā it's applied AFTER data is read and does NOT reduce RCU consumption. Always prefer begins_with on sort keys over contains in filter expressions.
Common Mistakes
Designing keys without listing access patterns first
Creating a table with 'obvious' keys (id as PK) then discovering you can't serve your queries. Always list every access pattern before designing keys.
ā Write down every query with inputs and outputs before creating the table. The table structure IS the query plan.
Using single-table design without understanding it
Copying single-table patterns from blog posts without understanding why. If you can't explain why items are co-located and what queries benefit, you're adding complexity without value.
ā Start simple with one table per entity. Consolidate into single-table only when access patterns are clear and co-location provides measurable benefit.
Forgetting that sort key comparisons are lexicographic
Storing numbers as sort keys without zero-padding. '9' sorts after '10' lexicographically.
ā Use zero-padded strings (v0000001) or ISO 8601 timestamps (2024-01-15T10:30:00Z) which sort correctly as strings.
Unbounded item collections
Designing a partition key where items accumulate without limit (all events for a device, all messages in a chat). Eventually hits the 10 GB partition limit.
ā Use time-bucketed partition keys: DEVICE#sensor1#2024-01. Rotate partitions by time period.
Not considering write amplification from GSIs
Every write to the base table also writes to each GSI that includes the item. 3 GSIs means 4Ć write cost.
ā Factor GSI write amplification into capacity planning. Minimize GSI count and use KEYS_ONLY projection where possible.