TablesItemsAttributesPartition KeySort KeyPartitions400 KB Limit

DynamoDB Core Concepts & Data Model

DynamoDB is not a flexible NoSQL database — it's a purpose-built engine that trades query flexibility for guaranteed performance at any scale. Every concept is interconnected.

45 min read9 sections

What is DynamoDB & Why It Exists

DynamoDB is widely misused because people treat it like a flexible NoSQL database. It is not. DynamoDB is a purpose-built engine for predictable, single-digit millisecond latency at any scale. That guarantee comes with a cost: you must define your access patterns upfront.

Amazon built DynamoDB after the 2004 holiday season outage that cost millions in revenue. The internal Dynamo paper (2007) described a system that would never go down, never slow down, and never require manual intervention. DynamoDB is the managed evolution of that vision — fully serverless, multi-AZ by default, with data replicated across three Availability Zones automatically.

🏎️

The Race Track vs The Open Road

PostgreSQL is an open road — you can drive anywhere, take any turn, explore any route. DynamoDB is a race track — the lanes are fixed, but within those lanes you get guaranteed speed that no open road can match. If you try to leave the track (query outside your access patterns), you crash into a wall. The entire skill of DynamoDB is designing the right track before you start driving.

DynamoDB's Position in AWS

AWS Ecosystem Integration

✅Fully managed — no servers, no patching, no cluster management
✅Multi-AZ by default — data replicated across 3 Availability Zones
✅Global Tables — multi-region active-active replication
✅Integrates natively with Lambda, API Gateway, AppSync, Kinesis, S3
✅On-demand or provisioned capacity — pay per request or reserve throughput

What DynamoDB is NOT

DynamoDB Limitations

❌Not a relational database — no joins, no foreign keys, no SQL
❌Not a flexible document store — schema-on-read only appears true
❌Not the right tool for ad-hoc querying or analytics
❌Not a replacement for PostgreSQL when you need ACID across entities
❌Not easy to learn — steep learning curve, especially data modeling

The Fundamental Trade

🔑 The Core Mental Model

DynamoDB makes one deliberate trade: you define access patterns upfront, and DynamoDB guarantees performance for those patterns forever. Queries outside your access patterns are expensive or impossible. This is not a bug — it is the entire design philosophy.

When DynamoDB is the Right Choice

✅ Use DynamoDB When

Access patterns are known and stable
Massive scale with no operational overhead
Single-digit ms latency at any RPS
Serverless architectures (pay per request)
Simple key-value or key-document lookups

❌ Avoid DynamoDB When

Complex queries, ad-hoc reporting
Relational data with many relationships
Small scale where simplicity matters more
Team lacks DynamoDB experience
Need full-text search or analytics

Scenario	DynamoDB	Better Alternative
Known access patterns, massive scale	✅ Ideal	—
Ad-hoc queries, complex filtering	❌	PostgreSQL, MongoDB
Strong consistency across items	Limited (transactions expensive)	PostgreSQL
Full-text search	❌	Elasticsearch + DynamoDB Streams
Multi-region active-active	✅ Global Tables	CockroachDB, Cassandra
Serverless, zero ops	✅	FaunaDB, PlanetScale
Relational with joins	❌	PostgreSQL, Aurora

Tables

A table is the top-level container in DynamoDB. There are no databases, no schemas, no namespaces — just tables. Table names are scoped to your AWS account and region.

Table Characteristics

✅No fixed schema — only the primary key attributes are required
✅Every other attribute can vary per item (schemaless beyond the key)
✅Table-level settings: capacity mode, encryption, TTL, streams, backups
✅Table classes: Standard vs Standard-Infrequent Access (lower storage cost)
✅No limit on items per table — tables can grow to petabytes

No Schema ≠ No Design

DynamoDB being "schemaless" is misleading. While items can have different attributes, your primary key design locks you into specific access patterns. The schema lives in your key design, not in a DDL statement. Poor key design cannot be fixed without a full data migration.

Items

An item is the unit of data in DynamoDB — equivalent to a row in a relational database. Each item is uniquely identified by its primary key.

Item Properties

✅Maximum item size: 400 KB — the most important limit to internalize
✅Items can have different attributes — DynamoDB is schemaless beyond the key
✅Sparse attributes: an item simply omits an attribute that doesn't apply
✅Each item is stored as a single unit — reads and writes are atomic per item
✅No concept of NULL taking space — absent attributes cost nothing

The 400 KB Limit

This limit includes attribute names, values, and overhead. It means you cannot store large blobs directly in DynamoDB. Design patterns: store metadata in DynamoDB, large objects in S3 with a pointer. Compress large attribute values. Keep attribute names short (they count toward the limit on every item).

Attributes & Data Types

Attributes are name-value pairs within an item. Attribute names are UTF-8, case-sensitive, with a maximum length of 64 KB.

Scalar Types

Type	Code	Description	Example
String	S	UTF-8 encoded text	"hello"
Number	N	Arbitrary precision (stored as string internally)	42, 3.14
Binary	B	Base64 encoded bytes	Compressed/encrypted data
Boolean	BOOL	true or false	true
Null	NULL	Represents unknown/undefined	true (the value is literally 'true')

Set Types (Unordered, Unique Values)

Type	Code	Use Case
String Set	SS	Tags, categories, permissions
Number Set	NS	Scores, IDs, quantities
Binary Set	BS	Hashes, fingerprints

Document Types

Type	Code	Description
List	L	Ordered collection, any types (like JSON array)
Map	M	Unordered key-value pairs (like JSON object)

Nesting Limit

Maps and Lists can be nested up to 32 levels deep. Numbers are stored as strings internally — no float/int distinction, arbitrary precision. This means no floating-point rounding errors.

Primary Keys

The primary key is the single most important design decision in DynamoDB. It determines how data is distributed, what queries are possible, and what performance you get. Keys are immutable — you cannot update a primary key, only delete and reinsert.

Two Types of Primary Keys

Simple Primary Key (Partition Key Only)

Single attribute uniquely identifies each item
Use when: natural unique ID exists, no range queries needed
Example: userId for a users table
Only supports GetItem (exact lookup)

Composite Primary Key (PK + SK)

Combination uniquely identifies each item
Items with same PK are co-located, sorted by SK
Enables range queries within a partition
Example: userId (PK) + createdAt (SK) for orders

Key Attribute Types

Only String, Number, or Binary are allowed as key attributes. No Booleans, no Lists, no Maps. This constraint is permanent and cannot be changed after table creation.

key-schema.tstypescript

// Simple primary key — partition key only
const usersTable = {
  TableName: "Users",
  KeySchema: [
    { AttributeName: "userId", KeyType: "HASH" }  // Partition key
  ]
};

// Composite primary key — partition key + sort key
const ordersTable = {
  TableName: "Orders",
  KeySchema: [
    { AttributeName: "userId", KeyType: "HASH" },   // Partition key
    { AttributeName: "createdAt", KeyType: "RANGE" } // Sort key
  ]
};

Partitions

DynamoDB distributes data across partitions internally. You never see or manage partitions directly, but understanding them is critical for performance.

📬

The Post Office Analogy

Think of partitions as post office sorting bins. The partition key is the zip code — it determines which bin your letter goes into. All letters with the same zip code end up in the same bin (co-located). The sort key is like the street address within that zip code — it determines the order within the bin. If everyone in the city sends mail to the same zip code, that bin overflows (hot partition).

Partition Internals

✅Partition key hash determines which partition an item lives in
✅Items with the same partition key always in the same partition
✅Each partition: up to 10 GB storage, 3000 RCU, 1000 WCU
✅Partition splits are automatic when limits are exceeded
✅Hot partition key = hot partition = throttling (the #1 DynamoDB problem)

Write Request

Application sends PutItem with partition key 'USER#123'

Hash Computation

DynamoDB hashes the partition key to determine target partition

Partition Routing

Request routed to the specific partition (one of potentially thousands)

Storage

Item stored within partition, sorted by sort key if composite

The Hot Partition Problem

If all requests target the same partition key, a single partition handles all traffic. Even with 10,000 WCU provisioned at the table level, one partition can only handle 1,000 WCU. Adaptive capacity helps but does not eliminate this. Design your partition key for even distribution.

Interview Questions

Q:What is the maximum item size in DynamoDB and why does it matter?

A: 400 KB including attribute names and values. This forces you to keep items lean, store large objects in S3, and use short attribute names. It also means you can't embed unlimited nested data — you must design for bounded item sizes.

Q:Why can't you change a primary key after table creation?

A: The primary key determines how data is physically distributed across partitions. Changing it would require redistributing all data — effectively creating a new table. This is why access pattern analysis before table creation is critical.

Q:What's the difference between a partition key and a sort key?

A: The partition key determines WHICH partition stores the item (distribution). The sort key determines the ORDER within that partition. Together they uniquely identify an item. You can Query all items sharing a partition key and filter/sort by sort key.

Q:When would you choose a simple primary key vs a composite primary key?

A: Simple key: when each item is independently accessed by a unique ID (user profiles by userId). Composite key: when you need to query collections of related items (all orders for a user, sorted by date). Most real-world tables use composite keys.

Q:How does DynamoDB differ from the original Dynamo paper?

A: The 2007 Dynamo paper described a peer-to-peer system with vector clocks for conflict resolution. DynamoDB uses a leader-based architecture, last-writer-wins for Global Tables, and is fully managed. It kept the key ideas (consistent hashing, partition-based distribution) but simplified operations.

Common Mistakes

🔄

Treating DynamoDB like a relational database

Trying to normalize data, create foreign keys, or perform joins. DynamoDB has no joins — you must denormalize and design for your access patterns. If you need relational queries, use PostgreSQL.

✅List access patterns first, then design keys and denormalized items to serve them directly.

🔥

Choosing low-cardinality partition keys

Using 'status' (active/inactive), 'date' (today is hot), or 'country' (US gets 80% of traffic) as partition keys. These create hot partitions.

✅Choose high-cardinality keys like userId, orderId, or deviceId that distribute traffic evenly.

📋

Designing the table before listing access patterns

In relational databases, you model entities first. In DynamoDB, you list every query your application needs FIRST, then design keys and indexes to serve them.

✅Write down every access pattern with inputs and outputs before creating the table.

📦

Ignoring the 400 KB item size limit

Embedding unbounded lists (all comments on a post, all items in an order) inside a single item. Eventually the item exceeds 400 KB and writes fail.

✅Design for bounded collections or use separate items with the same partition key.

🔍

Using Scan in production code paths

Scan reads every item in the table and charges you for all of it. A Scan on a 10 GB table costs the same whether you return 1 item or 1 million.

✅Use Query with proper key design. If you need to scan, add a GSI that serves the access pattern.