DynamoDB Core Concepts & Data Model
DynamoDB is not a flexible NoSQL database — it's a purpose-built engine that trades query flexibility for guaranteed performance at any scale. Every concept is interconnected.
Table of Contents
What is DynamoDB & Why It Exists
DynamoDB is widely misused because people treat it like a flexible NoSQL database. It is not. DynamoDB is a purpose-built engine for predictable, single-digit millisecond latency at any scale. That guarantee comes with a cost: you must define your access patterns upfront.
Amazon built DynamoDB after the 2004 holiday season outage that cost millions in revenue. The internal Dynamo paper (2007) described a system that would never go down, never slow down, and never require manual intervention. DynamoDB is the managed evolution of that vision — fully serverless, multi-AZ by default, with data replicated across three Availability Zones automatically.
The Race Track vs The Open Road
PostgreSQL is an open road — you can drive anywhere, take any turn, explore any route. DynamoDB is a race track — the lanes are fixed, but within those lanes you get guaranteed speed that no open road can match. If you try to leave the track (query outside your access patterns), you crash into a wall. The entire skill of DynamoDB is designing the right track before you start driving.
DynamoDB's Position in AWS
AWS Ecosystem Integration
- ✅Fully managed — no servers, no patching, no cluster management
- ✅Multi-AZ by default — data replicated across 3 Availability Zones
- ✅Global Tables — multi-region active-active replication
- ✅Integrates natively with Lambda, API Gateway, AppSync, Kinesis, S3
- ✅On-demand or provisioned capacity — pay per request or reserve throughput
What DynamoDB is NOT
DynamoDB Limitations
- ❌Not a relational database — no joins, no foreign keys, no SQL
- ❌Not a flexible document store — schema-on-read only appears true
- ❌Not the right tool for ad-hoc querying or analytics
- ❌Not a replacement for PostgreSQL when you need ACID across entities
- ❌Not easy to learn — steep learning curve, especially data modeling
The Fundamental Trade
🔑 The Core Mental Model
DynamoDB makes one deliberate trade: you define access patterns upfront, and DynamoDB guarantees performance for those patterns forever. Queries outside your access patterns are expensive or impossible. This is not a bug — it is the entire design philosophy.
When DynamoDB is the Right Choice
✅ Use DynamoDB When
- Access patterns are known and stable
- Massive scale with no operational overhead
- Single-digit ms latency at any RPS
- Serverless architectures (pay per request)
- Simple key-value or key-document lookups
❌ Avoid DynamoDB When
- Complex queries, ad-hoc reporting
- Relational data with many relationships
- Small scale where simplicity matters more
- Team lacks DynamoDB experience
- Need full-text search or analytics
| Scenario | DynamoDB | Better Alternative |
|---|---|---|
| Known access patterns, massive scale | ✅ Ideal | — |
| Ad-hoc queries, complex filtering | ❌ | PostgreSQL, MongoDB |
| Strong consistency across items | Limited (transactions expensive) | PostgreSQL |
| Full-text search | ❌ | Elasticsearch + DynamoDB Streams |
| Multi-region active-active | ✅ Global Tables | CockroachDB, Cassandra |
| Serverless, zero ops | ✅ | FaunaDB, PlanetScale |
| Relational with joins | ❌ | PostgreSQL, Aurora |
Tables
A table is the top-level container in DynamoDB. There are no databases, no schemas, no namespaces — just tables. Table names are scoped to your AWS account and region.
Table Characteristics
- ✅No fixed schema — only the primary key attributes are required
- ✅Every other attribute can vary per item (schemaless beyond the key)
- ✅Table-level settings: capacity mode, encryption, TTL, streams, backups
- ✅Table classes: Standard vs Standard-Infrequent Access (lower storage cost)
- ✅No limit on items per table — tables can grow to petabytes
No Schema ≠ No Design
DynamoDB being "schemaless" is misleading. While items can have different attributes, your primary key design locks you into specific access patterns. The schema lives in your key design, not in a DDL statement. Poor key design cannot be fixed without a full data migration.
Items
An item is the unit of data in DynamoDB — equivalent to a row in a relational database. Each item is uniquely identified by its primary key.
Item Properties
- ✅Maximum item size: 400 KB — the most important limit to internalize
- ✅Items can have different attributes — DynamoDB is schemaless beyond the key
- ✅Sparse attributes: an item simply omits an attribute that doesn't apply
- ✅Each item is stored as a single unit — reads and writes are atomic per item
- ✅No concept of NULL taking space — absent attributes cost nothing
The 400 KB Limit
This limit includes attribute names, values, and overhead. It means you cannot store large blobs directly in DynamoDB. Design patterns: store metadata in DynamoDB, large objects in S3 with a pointer. Compress large attribute values. Keep attribute names short (they count toward the limit on every item).
Attributes & Data Types
Attributes are name-value pairs within an item. Attribute names are UTF-8, case-sensitive, with a maximum length of 64 KB.
Scalar Types
| Type | Code | Description | Example |
|---|---|---|---|
| String | S | UTF-8 encoded text | "hello" |
| Number | N | Arbitrary precision (stored as string internally) | 42, 3.14 |
| Binary | B | Base64 encoded bytes | Compressed/encrypted data |
| Boolean | BOOL | true or false | true |
| Null | NULL | Represents unknown/undefined | true (the value is literally 'true') |
Set Types (Unordered, Unique Values)
| Type | Code | Use Case |
|---|---|---|
| String Set | SS | Tags, categories, permissions |
| Number Set | NS | Scores, IDs, quantities |
| Binary Set | BS | Hashes, fingerprints |
Document Types
| Type | Code | Description |
|---|---|---|
| List | L | Ordered collection, any types (like JSON array) |
| Map | M | Unordered key-value pairs (like JSON object) |
Nesting Limit
Maps and Lists can be nested up to 32 levels deep. Numbers are stored as strings internally — no float/int distinction, arbitrary precision. This means no floating-point rounding errors.
Primary Keys
The primary key is the single most important design decision in DynamoDB. It determines how data is distributed, what queries are possible, and what performance you get. Keys are immutable — you cannot update a primary key, only delete and reinsert.
Two Types of Primary Keys
Simple Primary Key (Partition Key Only)
- Single attribute uniquely identifies each item
- Use when: natural unique ID exists, no range queries needed
- Example: userId for a users table
- Only supports GetItem (exact lookup)
Composite Primary Key (PK + SK)
- Combination uniquely identifies each item
- Items with same PK are co-located, sorted by SK
- Enables range queries within a partition
- Example: userId (PK) + createdAt (SK) for orders
Key Attribute Types
Only String, Number, or Binary are allowed as key attributes. No Booleans, no Lists, no Maps. This constraint is permanent and cannot be changed after table creation.
// Simple primary key — partition key only const usersTable = { TableName: "Users", KeySchema: [ { AttributeName: "userId", KeyType: "HASH" } // Partition key ] }; // Composite primary key — partition key + sort key const ordersTable = { TableName: "Orders", KeySchema: [ { AttributeName: "userId", KeyType: "HASH" }, // Partition key { AttributeName: "createdAt", KeyType: "RANGE" } // Sort key ] };
Partitions
DynamoDB distributes data across partitions internally. You never see or manage partitions directly, but understanding them is critical for performance.
The Post Office Analogy
Think of partitions as post office sorting bins. The partition key is the zip code — it determines which bin your letter goes into. All letters with the same zip code end up in the same bin (co-located). The sort key is like the street address within that zip code — it determines the order within the bin. If everyone in the city sends mail to the same zip code, that bin overflows (hot partition).
Partition Internals
- ✅Partition key hash determines which partition an item lives in
- ✅Items with the same partition key always in the same partition
- ✅Each partition: up to 10 GB storage, 3000 RCU, 1000 WCU
- ✅Partition splits are automatic when limits are exceeded
- ✅Hot partition key = hot partition = throttling (the #1 DynamoDB problem)
Write Request
Application sends PutItem with partition key 'USER#123'
Hash Computation
DynamoDB hashes the partition key to determine target partition
Partition Routing
Request routed to the specific partition (one of potentially thousands)
Storage
Item stored within partition, sorted by sort key if composite
The Hot Partition Problem
If all requests target the same partition key, a single partition handles all traffic. Even with 10,000 WCU provisioned at the table level, one partition can only handle 1,000 WCU. Adaptive capacity helps but does not eliminate this. Design your partition key for even distribution.
Interview Questions
Q:What is the maximum item size in DynamoDB and why does it matter?
A: 400 KB including attribute names and values. This forces you to keep items lean, store large objects in S3, and use short attribute names. It also means you can't embed unlimited nested data — you must design for bounded item sizes.
Q:Why can't you change a primary key after table creation?
A: The primary key determines how data is physically distributed across partitions. Changing it would require redistributing all data — effectively creating a new table. This is why access pattern analysis before table creation is critical.
Q:What's the difference between a partition key and a sort key?
A: The partition key determines WHICH partition stores the item (distribution). The sort key determines the ORDER within that partition. Together they uniquely identify an item. You can Query all items sharing a partition key and filter/sort by sort key.
Q:When would you choose a simple primary key vs a composite primary key?
A: Simple key: when each item is independently accessed by a unique ID (user profiles by userId). Composite key: when you need to query collections of related items (all orders for a user, sorted by date). Most real-world tables use composite keys.
Q:How does DynamoDB differ from the original Dynamo paper?
A: The 2007 Dynamo paper described a peer-to-peer system with vector clocks for conflict resolution. DynamoDB uses a leader-based architecture, last-writer-wins for Global Tables, and is fully managed. It kept the key ideas (consistent hashing, partition-based distribution) but simplified operations.
Common Mistakes
Treating DynamoDB like a relational database
Trying to normalize data, create foreign keys, or perform joins. DynamoDB has no joins — you must denormalize and design for your access patterns. If you need relational queries, use PostgreSQL.
✅List access patterns first, then design keys and denormalized items to serve them directly.
Choosing low-cardinality partition keys
Using 'status' (active/inactive), 'date' (today is hot), or 'country' (US gets 80% of traffic) as partition keys. These create hot partitions.
✅Choose high-cardinality keys like userId, orderId, or deviceId that distribute traffic evenly.
Designing the table before listing access patterns
In relational databases, you model entities first. In DynamoDB, you list every query your application needs FIRST, then design keys and indexes to serve them.
✅Write down every access pattern with inputs and outputs before creating the table.
Ignoring the 400 KB item size limit
Embedding unbounded lists (all comments on a post, all items in an order) inside a single item. Eventually the item exceeds 400 KB and writes fail.
✅Design for bounded collections or use separate items with the same partition key.
Using Scan in production code paths
Scan reads every item in the table and charges you for all of it. A Scan on a 10 GB table costs the same whether you return 1 item or 1 million.
✅Use Query with proper key design. If you need to scan, add a GSI that serves the access pattern.