Union-Find (Disjoint Set Union) — The Complete Guide
Master Union-Find from first principles. Learn to group elements into dynamic sets, detect cycles, count connected components, and confidently solve every DSU-based interview question.
Table of Contents
Intuition from Scratch
Why does this pattern exist?
Imagine you're at a party and people keep introducing each other. "Alice knows Bob." "Bob knows Carol." Now Alice, Bob, and Carol are all in the same friend group — even though Alice and Carol never met directly. As more introductions happen, friend groups merge. At any point, you want to answer: "Are these two people in the same group?" and "How many groups are there?"
Union-Find (also called Disjoint Set Union or DSU) is a data structure built exactly for this. It supports two operations: union(a, b) — merge the groups containing a and b, and find(a) — determine which group a belongs to. Both operations run in nearly O(1) amortized time with two optimizations: path compression and union by rank.
Union-Find is the go-to data structure whenever you need to dynamically group elements and query connectivity. It's simpler than graph traversal (BFS/DFS) for connectivity problems, and it handles edges arriving one at a time — something BFS/DFS can't do efficiently without rebuilding the graph.
How It Works (The Basics)
Each element starts as its own group (parent = itself). find(x): Follow parent pointers until you reach the root. The root is the "representative" of the group. union(a, b): Find the roots of a and b. If different, make one root point to the other. Now both groups are merged. Example: union(1, 2), union(3, 4), union(2, 3) Step 1: union(1, 2) Step 2: union(3, 4) Step 3: union(2, 3) 1 ← 2 1 ← 2 3 ← 4 1 ← 2 ↑ 3 ← 4 find(1) = 1, find(4) = 1 → same group ✓ Two optimizations make this nearly O(1): Path Compression: During find(), point every node directly to the root. Union by Rank: Attach the shorter tree under the taller tree.
Analogies to Build Intuition
Friend Groups at a Party
Each person starts alone. When two people are introduced, their entire friend groups merge. To check if Alice and Dave know each other (transitively), you check if they're in the same group. Union-Find tracks these merging groups efficiently — each 'find' asks 'who's the leader of your group?' and each 'union' merges two groups under one leader.
Islands Connected by Bridges
Imagine islands in an ocean. Each bridge connects two islands. As bridges are built, islands merge into larger landmasses. Union-Find tracks which islands are connected. 'Find' tells you which landmass an island belongs to. 'Union' merges two landmasses when a bridge is built between them.
Electrical Wiring
You have components that need to be connected. Each wire connects two components. After all wiring, you want to know: are these two components on the same circuit? How many separate circuits exist? Union-Find answers both questions — union when you add a wire, find to check connectivity, count roots for the number of circuits.
What kind of problems does it solve?
Union-Find is your go-to when:
- You need to count connected components as edges are added
- You need to check if two nodes are connected (same component)
- You need to detect cycles in an undirected graph
- You need to build a Minimum Spanning Tree (Kruskal's algorithm)
- You need to group elements dynamically based on equivalence relations
- You need to process edges/connections one at a time (online connectivity)
The core insight
Union-Find answers "are these two elements in the same group?" in nearly O(1). It's the fastest way to track dynamic connectivity. Whenever you see a problem about merging groups, checking connectivity, or counting components — and edges arrive one at a time — Union-Find is almost certainly the answer.
Pattern Recognition
Recognizing when Union-Find is the right tool is the most important skill. Here are the signals to look for and the traps to avoid.
Keywords & Signals in the Problem Statement
Use Union-Find when you see...
- ✅"Number of connected components" or "number of islands"
- ✅"Are these two nodes connected?" (dynamic connectivity)
- ✅"Detect a cycle in an undirected graph"
- ✅"Minimum spanning tree" (Kruskal's algorithm)
- ✅"Group elements by equivalence" (accounts merge, similar strings)
- ✅"Redundant connection" — find the edge that creates a cycle
- ✅"Earliest time when all nodes are connected"
- ✅"Make connected" — minimum edges to connect all components
- ✅Edges/connections arriving one at a time (online processing)
- ✅"Union" or "merge" in the problem description
Don't use Union-Find when...
- ❌You need shortest path — use BFS or Dijkstra
- ❌You need to traverse all nodes in a component — use BFS/DFS
- ❌The graph is directed — Union-Find is for undirected graphs only
- ❌You need to split groups (un-union) — Union-Find doesn't support this
- ❌The graph is static and you just need components once — BFS/DFS is simpler
- ❌You need topological sort — use DFS or Kahn's algorithm
Union-Find vs. Similar Patterns
| Pattern | When to Use | Key Difference |
|---|---|---|
| Union-Find (DSU) | Dynamic connectivity, cycle detection, component counting, MST | Nearly O(1) union/find — handles edges arriving one at a time |
| BFS/DFS | Traversal, shortest path (unweighted), exploring all nodes in a component | O(V + E) per traversal — better when you need to visit nodes, not just check connectivity |
| Kruskal's (uses DSU) | Minimum Spanning Tree | Sort edges by weight, use Union-Find to add edges without creating cycles |
| Topological Sort | Ordering in directed acyclic graphs | For directed graphs — Union-Find only works on undirected graphs |
| Graph Coloring | Bipartiteness check, 2-coloring | BFS/DFS with colors — Union-Find can also solve bipartite checks with a weighted variant |
The Union-Find question to always ask
When you see a connectivity problem, ask: "Am I processing edges one at a time and need to check/update connectivity after each edge?" If yes → Union-Find. If you just need to find components once on a static graph → BFS/DFS is simpler. Union-Find shines when the graph is built incrementally.
Brute Force → Optimized Thinking
Let's walk through the thinking evolution using a concrete example: Number of Connected Components in an Undirected Graph — given n nodes and a list of edges, return the number of connected components.
Brute Force — BFS/DFS from Every Unvisited Node
Build an adjacency list, then run BFS/DFS from each unvisited node. Each traversal discovers one component. Count the number of traversals. This works but requires building the full graph upfront.
function countComponentsBFS(n: number, edges: number[][]): number { // Build adjacency list — O(V + E) const adj: number[][] = Array.from({ length: n }, () => []); for (const [u, v] of edges) { adj[u].push(v); adj[v].push(u); } const visited = new Set<number>(); let components = 0; for (let i = 0; i < n; i++) { if (!visited.has(i)) { components++; // BFS to mark all nodes in this component const queue = [i]; visited.add(i); while (queue.length > 0) { const node = queue.shift()!; for (const neighbor of adj[node]) { if (!visited.has(neighbor)) { visited.add(neighbor); queue.push(neighbor); } } } } } return components; } // Works fine for static graphs. But what if edges arrive one at a time // and you need the component count after EACH edge? You'd need to // rebuild and re-traverse — O((V + E) × E) total. Way too slow.
Naive Union-Find — No Optimizations
Each element has a parent pointer. Find follows the chain to the root. Union connects two roots. But without optimizations, the tree can become a long chain (like a linked list), making find O(n).
class NaiveUF { parent: number[]; constructor(n: number) { this.parent = Array.from({ length: n }, (_, i) => i); } find(x: number): number { while (this.parent[x] !== x) { x = this.parent[x]; // follow parent chain — O(n) worst case } return x; } union(a: number, b: number): void { const rootA = this.find(a); const rootB = this.find(b); if (rootA !== rootB) { this.parent[rootA] = rootB; // just attach one root to the other } } } // Problem: Without optimizations, the tree can degenerate into a chain. // find() becomes O(n), making total complexity O(n × E).
Optimal — Path Compression + Union by Rank
Two optimizations make Union-Find nearly O(1): (1) Path compression — during find(), point every node directly to the root, flattening the tree. (2) Union by rank — attach the shorter tree under the taller one, keeping trees balanced. Together, they give O(α(n)) amortized time, where α is the inverse Ackermann function — effectively constant.
class UnionFind { parent: number[]; rank: number[]; count: number; // number of components constructor(n: number) { this.parent = Array.from({ length: n }, (_, i) => i); this.rank = new Array(n).fill(0); this.count = n; } find(x: number): number { if (this.parent[x] !== x) { this.parent[x] = this.find(this.parent[x]); // path compression } return this.parent[x]; } union(a: number, b: number): boolean { const rootA = this.find(a); const rootB = this.find(b); if (rootA === rootB) return false; // already connected // Union by rank — attach shorter tree under taller if (this.rank[rootA] < this.rank[rootB]) { this.parent[rootA] = rootB; } else if (this.rank[rootA] > this.rank[rootB]) { this.parent[rootB] = rootA; } else { this.parent[rootB] = rootA; this.rank[rootA]++; } this.count--; return true; // merged two components } connected(a: number, b: number): boolean { return this.find(a) === this.find(b); } } // α(n) ≤ 4 for any practical n (up to 2^65536). // Effectively O(1) per operation. Total for E edges: O(E × α(n)) ≈ O(E).
Why Do the Optimizations Work?
Path compression flattens the tree
Every time you call find(x), you make every node on the path point directly to the root. Next time you call find() on any of those nodes, it's O(1). The tree becomes almost flat after a few operations.
Union by rank keeps trees balanced
By always attaching the shorter tree under the taller one, you prevent the tree from degenerating into a chain. The maximum height is O(log n) even without path compression. With both optimizations, it's O(α(n)) ≈ O(1).
Together they give inverse Ackermann time
The inverse Ackermann function α(n) grows so slowly that it's ≤ 4 for any n up to 2^65536. For all practical purposes, each operation is O(1). This makes Union-Find one of the most efficient data structures in existence.
The thinking process matters more than the code
In interviews, walk through this progression: "BFS/DFS works for static graphs but can't handle edges arriving one at a time efficiently. Naive Union-Find has O(n) find. With path compression and union by rank, each operation is amortized O(α(n)) ≈ O(1)." This shows the interviewer you understand the evolution from brute force to optimal.
Core Templates
Union-Find problems follow a few recurring templates. Memorize the base class and these usage patterns — most problems are variations.
Template 1: The Union-Find Class (Reusable)
This is the class you'll write in every Union-Find problem. Memorize it — it's always the same.
class UnionFind { parent: number[]; rank: number[]; count: number; constructor(n: number) { this.parent = Array.from({ length: n }, (_, i) => i); this.rank = new Array(n).fill(0); this.count = n; // each element is its own component } find(x: number): number { if (this.parent[x] !== x) { this.parent[x] = this.find(this.parent[x]); // path compression } return this.parent[x]; } union(a: number, b: number): boolean { const rootA = this.find(a); const rootB = this.find(b); if (rootA === rootB) return false; // already same group if (this.rank[rootA] < this.rank[rootB]) { this.parent[rootA] = rootB; } else if (this.rank[rootA] > this.rank[rootB]) { this.parent[rootB] = rootA; } else { this.parent[rootB] = rootA; this.rank[rootA]++; } this.count--; return true; } connected(a: number, b: number): boolean { return this.find(a) === this.find(b); } } // Variation: use "size" instead of "rank" (union by size) // — attach smaller tree under larger, track component sizes.
Template 2: Count Connected Components
The most common usage. Create a Union-Find, process all edges, and the component count is tracked automatically.
function countComponents(n: number, edges: number[][]): number { const uf = new UnionFind(n); for (const [u, v] of edges) { uf.union(u, v); } return uf.count; } // Each union that merges two different components decrements count. // Final count = number of connected components. // When to use: Number of Provinces, Number of Islands (with coordinate mapping), // Number of Connected Components
Template 3: Cycle Detection in Undirected Graph
If union(a, b) returns false (they're already in the same component), the edge (a, b) creates a cycle. This is the foundation of Kruskal's MST algorithm.
function findRedundantConnection(edges: number[][]): number[] { const n = edges.length; const uf = new UnionFind(n + 1); // 1-indexed nodes for (const [u, v] of edges) { if (!uf.union(u, v)) { return [u, v]; // this edge creates a cycle! } } return []; // no cycle found } // Key insight: if find(u) === find(v) BEFORE adding edge (u, v), // then u and v are already connected — adding this edge creates a cycle. // When to use: Redundant Connection, Graph Valid Tree, Kruskal's MST
Template 4: Union-Find on a 2D Grid
For grid problems (like Number of Islands), map 2D coordinates to 1D indices: index = row × cols + col. Then use standard Union-Find.
function numIslands(grid: string[][]): number { const rows = grid.length, cols = grid[0].length; const uf = new UnionFind(rows * cols); let waterCount = 0; const dirs = [[0, 1], [1, 0]]; // only right and down to avoid duplicates for (let r = 0; r < rows; r++) { for (let c = 0; c < cols; c++) { if (grid[r][c] === '0') { waterCount++; continue; } const idx = r * cols + c; for (const [dr, dc] of dirs) { const nr = r + dr, nc = c + dc; if (nr < rows && nc < cols && grid[nr][nc] === '1') { uf.union(idx, nr * cols + nc); } } } } return uf.count - waterCount; } // Map (row, col) → row * cols + col for 1D indexing. // Only check right and down neighbors to avoid double-counting. // Subtract water cells from the component count.
Which template to pick?
Ask yourself: (1) Am I counting components? → Template 2. (2) Am I detecting a cycle or finding a redundant edge? → Template 3. (3) Is the problem on a 2D grid? → Template 4. All of them use the same Union-Find class (Template 1).
Variations & Sub-Patterns
Union-Find isn't a single trick — it's a versatile tool with several important variations.
| Variation | Strategy | Classic Example |
|---|---|---|
| Basic Connectivity | Union edges, count components or check connected | Number of Provinces, Number of Connected Components |
| Cycle Detection | If union returns false, the edge creates a cycle | Redundant Connection, Graph Valid Tree |
| Kruskal's MST | Sort edges by weight, union if not already connected | Min Cost to Connect All Points, Connecting Cities With Minimum Cost |
| Union by Size | Track component sizes, attach smaller under larger | Accounts Merge, Largest Component Size by Common Factor |
| 2D Grid DSU | Map (row, col) to 1D index, union adjacent land cells | Number of Islands, Number of Islands II (online) |
| Weighted / Ranked DSU | Track relative weights between nodes (e.g., ratios) | Evaluate Division, Swim in Rising Water |
| Online Connectivity | Process edges one at a time, answer queries after each | Earliest Moment When Everyone Becomes Friends |
Problems mentioned above
Deep Dive: Kruskal's MST — Min Cost to Connect All Points
Given n points, connect all of them with minimum total Manhattan distance. This is Kruskal's algorithm: sort all possible edges by weight, then greedily add edges that connect different components (using Union-Find to check).
function minCostConnectPoints(points: number[][]): number { const n = points.length; const edges: [number, number, number][] = []; // [weight, u, v] // Generate all edges with Manhattan distance for (let i = 0; i < n; i++) { for (let j = i + 1; j < n; j++) { const dist = Math.abs(points[i][0] - points[j][0]) + Math.abs(points[i][1] - points[j][1]); edges.push([dist, i, j]); } } // Sort edges by weight (ascending) edges.sort((a, b) => a[0] - b[0]); const uf = new UnionFind(n); let totalCost = 0; let edgesUsed = 0; for (const [weight, u, v] of edges) { if (uf.union(u, v)) { totalCost += weight; edgesUsed++; if (edgesUsed === n - 1) break; // MST has n-1 edges } } return totalCost; } // Time: O(E log E) for sorting + O(E × α(n)) for unions // Space: O(E + n) // Key: Kruskal's = sort edges + Union-Find to avoid cycles. // MST is complete when we've added n-1 edges.
Deep Dive: Accounts Merge — Union-Find with Grouping
Given accounts where each has a name and emails, merge accounts that share any email. This is a Union-Find problem where emails are the elements and accounts are merged when they share an email.
function accountsMerge(accounts: string[][]): string[][] { const uf = new UnionFind(accounts.length); const emailToAccount = new Map<string, number>(); // email → account index // Union accounts that share emails for (let i = 0; i < accounts.length; i++) { for (let j = 1; j < accounts[i].length; j++) { const email = accounts[i][j]; if (emailToAccount.has(email)) { uf.union(i, emailToAccount.get(email)!); } else { emailToAccount.set(email, i); } } } // Group emails by root account const groups = new Map<number, Set<string>>(); for (let i = 0; i < accounts.length; i++) { const root = uf.find(i); if (!groups.has(root)) groups.set(root, new Set()); for (let j = 1; j < accounts[i].length; j++) { groups.get(root)!.add(accounts[i][j]); } } // Build result const result: string[][] = []; for (const [root, emails] of groups) { result.push([accounts[root][0], ...[...emails].sort()]); } return result; } // Time: O(n × k × α(n) + E log E) where k = emails per account // Key: Union accounts (not emails) — use email as the "bridge" to find // which accounts should be merged. Then group emails by root account.
Deep Dive: Number of Islands II — Online DSU
Given a grid and a list of positions where land is added one at a time, return the number of islands after each addition. This is the classic online Union-Find problem — BFS/DFS would need to re-traverse after each addition.
function numIslands2(m: number, n: number, positions: number[][]): number[] { const uf = new UnionFind(m * n); uf.count = 0; // start with 0 components (no land yet) const grid = Array.from({ length: m }, () => new Array(n).fill(false)); const result: number[] = []; const dirs = [[0, 1], [0, -1], [1, 0], [-1, 0]]; for (const [r, c] of positions) { if (grid[r][c]) { result.push(uf.count); // duplicate — skip continue; } grid[r][c] = true; uf.count++; // new island const idx = r * n + c; uf.parent[idx] = idx; // initialize this cell for (const [dr, dc] of dirs) { const nr = r + dr, nc = c + dc; if (nr >= 0 && nr < m && nc >= 0 && nc < n && grid[nr][nc]) { uf.union(idx, nr * n + nc); // merge with adjacent land } } result.push(uf.count); } return result; } // Time: O(k × α(m×n)) where k = number of positions // Key: each new land cell starts as its own island (count++), // then merges with adjacent land (count-- per successful union). // BFS/DFS would be O(k × m × n) — Union-Find is much faster.
Visual Walkthrough
Let's trace through three classic Union-Find scenarios step by step.
Union-Find Operations — Step by Step
n = 6, edges = [[0,1], [1,2], [3,4], [2,4]] Initial: parent = [0, 1, 2, 3, 4, 5], rank = [0,0,0,0,0,0], count = 6 Each node is its own component: {0} {1} {2} {3} {4} {5} Edge [0, 1]: find(0)=0, find(1)=1 → different roots union: parent[1] = 0, rank[0] = 1, count = 5 parent = [0, 0, 2, 3, 4, 5] Components: {0,1} {2} {3} {4} {5} Edge [1, 2]: find(1)→parent[1]=0→find(0)=0, find(2)=2 → different union: parent[2] = 0, count = 4 parent = [0, 0, 0, 3, 4, 5] Components: {0,1,2} {3} {4} {5} Path compression: find(1) set parent[1] = 0 (already was 0) find(2) set parent[2] = 0 ✓ Edge [3, 4]: find(3)=3, find(4)=4 → different union: parent[4] = 3, rank[3] = 1, count = 3 parent = [0, 0, 0, 3, 3, 5] Components: {0,1,2} {3,4} {5} Edge [2, 4]: find(2)→parent[2]=0=root, find(4)→parent[4]=3=root find(2)=0, find(4)=3 → different union: rank[0]=1, rank[3]=1 → equal, parent[3] = 0, rank[0] = 2 count = 2 parent = [0, 0, 0, 0, 3, 5] Components: {0,1,2,3,4} {5} Now find(4): parent[4]=3 → parent[3]=0 → root! Path compression: parent[4] = 0 (shortcut!) parent = [0, 0, 0, 0, 0, 5] Final: 2 components: {0,1,2,3,4} and {5}
Cycle Detection — Step by Step
edges = [[1,2], [1,3], [2,3]] (1-indexed) Initial: parent = [0, 1, 2, 3], count = 3 Edge [1, 2]: find(1)=1, find(2)=2 → different → union ✓ parent = [0, 1, 1, 3], count = 2 Edge [1, 3]: find(1)=1, find(3)=3 → different → union ✓ parent = [0, 1, 1, 1], count = 1 Edge [2, 3]: find(2)→parent[2]=1=root, find(3)→parent[3]=1=root find(2)=1, find(3)=1 → SAME ROOT! union returns false → THIS EDGE CREATES A CYCLE! Answer: [2, 3] Key: Before adding edge (u, v), if find(u) === find(v), they're already connected — this edge is redundant and creates a cycle.
Kruskal's MST — Step by Step
4 nodes, edges sorted by weight: [1, 0,1], [2, 1,2], [3, 0,2], [4, 2,3], [5, 1,3] Initial: parent = [0,1,2,3], count = 4, totalCost = 0 Edge weight=1, (0,1): find(0)≠find(1) → union ✓ totalCost = 1, edgesUsed = 1, count = 3 Edge weight=2, (1,2): find(1)≠find(2) → union ✓ totalCost = 3, edgesUsed = 2, count = 2 Edge weight=3, (0,2): find(0)=find(2) → SAME! Skip (would create cycle) Edge weight=4, (2,3): find(2)≠find(3) → union ✓ totalCost = 7, edgesUsed = 3 = n-1 → DONE! MST edges: (0,1), (1,2), (2,3), total cost = 7 Key: Sort edges by weight. Greedily add the cheapest edge that connects two different components. Union-Find detects cycles in O(α(n)). Stop when edgesUsed = n - 1 (MST has exactly n-1 edges).
Practice Problems
These 7 problems are carefully ordered from easy to hard. Each one teaches a different aspect of Union-Find. Solve them in order.
LC 547. Number of Provinces
Why this pattern:
Basic connectivity counting. Given an adjacency matrix of cities, find the number of connected groups (provinces). This is the 'hello world' of Union-Find — union connected cities, return the component count.
Key idea: Create UnionFind(n). For each pair (i, j) where isConnected[i][j] = 1, call union(i, j). Return uf.count. The adjacency matrix is symmetric, so only process the upper triangle to avoid redundant unions.
LC 684. Redundant Connection
Why this pattern:
Cycle detection — find the edge that creates a cycle. Process edges in order. The first edge where union returns false (both endpoints already connected) is the redundant edge.
Key idea: For each edge [u, v]: if union(u, v) returns false, this edge connects two already-connected nodes — it's the cycle-creating edge. Return it. The problem guarantees exactly one redundant edge.
LC 261. Graph Valid Tree
Why this pattern:
A valid tree has exactly n-1 edges and no cycles. Use Union-Find to check both: if any union returns false, there's a cycle. If edges.length !== n-1, it's not a tree.
Key idea: First check: edges.length === n - 1 (necessary condition). Then process all edges with Union-Find. If any union returns false (cycle detected), return false. If all unions succeed and edge count is n-1, it's a valid tree.
LC 323. Number of Connected Components in an Undirected Graph
Why this pattern:
Direct component counting. Given n nodes and edges, count connected components. Create UnionFind(n), process all edges, return uf.count. The simplest Union-Find application.
Key idea: Initialize with n components. Each successful union decrements the count. After processing all edges, uf.count is the answer. This is Template 2 in its purest form.
LC 721. Accounts Merge
Why this pattern:
Union-Find with grouping. Accounts sharing any email belong to the same person. Union account indices when they share an email. Then group all emails by their root account.
Key idea: Map each email to the first account that owns it. When a later account has the same email, union the two accounts. After all unions, group emails by root account index. Sort emails within each group. The tricky part is the email-to-account mapping, not the Union-Find itself.
LC 1584. Min Cost to Connect All Points
Why this pattern:
Kruskal's MST. Generate all edges with Manhattan distances, sort by weight, greedily add edges that connect different components. Stop when n-1 edges are added.
Key idea: Generate O(n²) edges. Sort by distance. Use Union-Find to add the cheapest edge that doesn't create a cycle. Total cost of the n-1 accepted edges is the MST weight. Early termination when edgesUsed = n-1.
LC 778. Swim in Rising Water
Why this pattern:
Sort cells by elevation, process from lowest to highest. At each elevation, 'activate' the cell and union it with adjacent active cells. The answer is the elevation when (0,0) and (n-1,n-1) become connected.
Key idea: Create (elevation, row, col) tuples for all cells. Sort by elevation. Process in order: activate each cell, union with active neighbors. After each union, check if top-left and bottom-right are connected. The current elevation is the answer. This is Union-Find + sorting — a powerful combination.
Practice strategy
For each problem: (1) Identify what the "elements" and "connections" are. (2) Write the Union-Find class first (it's always the same). (3) Figure out the union logic. (4) After solving, write: "I union [what] when [condition], and the answer is [what I read from the DSU]."
Common Mistakes
These are the traps that catch people on Union-Find problems. Learn them here so you don't learn them in an interview.
Forgetting path compression
You implement find() as a simple while loop without path compression. The tree degenerates into a chain, and find() becomes O(n) instead of O(α(n)). Your solution TLEs on large inputs.
✅Always use path compression: this.parent[x] = this.find(this.parent[x]). This single line flattens the tree during every find() call. It's the difference between O(n) and O(1) amortized. Never skip it.
Forgetting union by rank/size
You always attach rootA under rootB (or vice versa) without considering tree heights. This can create unbalanced trees, degrading performance.
✅Track rank (or size) for each root. Always attach the shorter/smaller tree under the taller/larger one. If ranks are equal, pick either and increment the winner's rank. This keeps trees balanced.
Off-by-one with 1-indexed nodes
The problem uses 1-indexed nodes (like Redundant Connection) but you create UnionFind(n) instead of UnionFind(n+1). Node n has no slot in the parent array.
✅Check the problem's indexing. If nodes are 1 to n, create UnionFind(n + 1) so index n is valid. Alternatively, subtract 1 from all node values to make them 0-indexed.
Wrong 2D-to-1D mapping on grids
You map (row, col) to row + col instead of row * cols + col. Multiple cells map to the same index, causing incorrect unions.
✅The correct mapping is: index = row * numCols + col. This gives each cell a unique index in [0, rows × cols). Double-check by verifying that (0, cols-1) and (1, 0) map to different indices.
Unioning roots instead of calling find() first
You do parent[a] = b instead of parent[find(a)] = find(b). This connects node a to node b, but doesn't merge their entire groups. Nodes in a's group that don't go through a are now disconnected.
✅ALWAYS call find() on both arguments before modifying parent. The union operation connects ROOTS, not individual nodes: rootA = find(a), rootB = find(b), then parent[rootA] = rootB.
Not tracking component count correctly
You decrement count on every union call, even when the two elements are already in the same component. This makes the count go negative or gives wrong results.
✅Only decrement count when union actually merges two DIFFERENT components. Check if find(a) !== find(b) before decrementing. The union method should return a boolean indicating whether a merge happened.
Interview Strategy
Knowing when and how to use Union-Find is half the battle. Here's how to present it in an interview setting to maximize your score.
The 5-Step Interview Flow
Identify the Connectivity Pattern
'I see that we're processing connections one at a time and need to track which elements are in the same group. This is a classic Union-Find problem.' Name the pattern early — it shows confidence.
Mention BFS/DFS as the Alternative
'I could use BFS/DFS to find components, but since edges arrive one at a time and I need connectivity after each edge, Union-Find is more efficient — O(α(n)) per operation vs rebuilding the graph each time.' Show you considered alternatives.
Write the Union-Find Class First
'Let me write the Union-Find class with path compression and union by rank — this is standard.' Write it quickly and confidently. It's always the same code. This is your foundation.
Explain the Problem-Specific Logic
'Now I'll process each edge: union the two endpoints. If union returns false, this edge creates a cycle.' The Union-Find class is generic — the interesting part is how you USE it for the specific problem.
Analyze Complexity
'Each union/find is O(α(n)) amortized, which is effectively O(1). With E edges, total time is O(E × α(n)) ≈ O(E). Space is O(n) for the parent and rank arrays.' Mention α(n) — it shows you understand the theory.
Common Follow-Up Questions
| Follow-Up | How to Handle |
|---|---|
| "What is α(n)?" | The inverse Ackermann function. It grows so slowly that α(n) ≤ 4 for any n up to 2^65536. For all practical purposes, it's constant. It comes from the combined analysis of path compression + union by rank. |
| "Can you use BFS/DFS instead?" | Yes, for static graphs. BFS/DFS is O(V + E) per traversal. But if edges arrive one at a time and you need connectivity after each, you'd need to re-traverse — O(E × (V + E)) total. Union-Find handles this in O(E × α(n)) ≈ O(E). |
| "What about union by size instead of rank?" | Both work. Union by size attaches the smaller component under the larger one and tracks actual sizes. Union by rank uses an upper bound on height. Both give O(α(n)). Size is useful when you need component sizes (e.g., largest component). |
| "Can Union-Find handle directed graphs?" | Standard Union-Find is for undirected graphs only. For directed graphs, you need different techniques: DFS for cycle detection, topological sort for ordering. There's a weighted Union-Find variant for some directed problems (like Evaluate Division). |
| "Can you un-union (split groups)?" | Standard Union-Find doesn't support splitting. If you need to remove edges, process edges in reverse order (offline) or use a more complex structure like Link-Cut Trees. Mention this limitation. |
| "Why not just use a hash set for each component?" | Merging two hash sets is O(min(|A|, |B|)) per merge. With n merges, worst case is O(n²). Union-Find is O(α(n)) per operation — much faster. The tree structure with path compression is the key advantage. |
The meta-skill
Interviewers love Union-Find because it tests whether you can recognize a connectivity pattern and implement a non-trivial data structure from scratch. The key sentence: "I'll use Union-Find with path compression and union by rank. Each operation is O(α(n)) amortized, effectively O(1). I union [elements] when [condition], and the answer is [component count / cycle detection / MST cost]."
Summary & Cheat Sheet
Pin this section. Review it before mock interviews and contests.
Quick Revision Cheat Sheet
Core idea: Track dynamic groups with two operations: find(x) returns the group leader, union(a,b) merges two groups. Nearly O(1) per operation.
Path compression: During find(), point every node directly to the root: parent[x] = find(parent[x]). Flattens the tree for future lookups.
Union by rank: Attach shorter tree under taller. If equal rank, pick either and increment. Prevents degenerate chains.
Complexity: O(α(n)) per operation where α is inverse Ackermann — effectively O(1). Total for E edges: O(E × α(n)) ≈ O(E).
Component count: Start with count = n. Each successful union decrements count. Final count = number of connected components.
Cycle detection: If find(u) === find(v) BEFORE adding edge (u,v), the edge creates a cycle. This is how Kruskal's avoids cycles.
Kruskal's MST: Sort edges by weight. Greedily add cheapest edge that connects different components (union returns true). Stop at n-1 edges.
2D grid mapping: index = row × numCols + col. Creates unique 1D index for each cell. Only union right and down neighbors to avoid duplicates.
Online connectivity: Union-Find handles edges arriving one at a time. BFS/DFS needs the full graph upfront. This is Union-Find's killer advantage.
Union by size: Alternative to rank: track actual component sizes. Attach smaller under larger. Useful when you need component sizes.
Can't un-union: Union-Find doesn't support splitting groups. If needed, process in reverse order (offline) or use Link-Cut Trees.
Undirected only: Standard Union-Find is for undirected graphs. For directed graphs, use DFS, topological sort, or weighted DSU.
Mental trigger: "Connected components" or "merge groups" or "cycle in undirected graph" or "MST" → Union-Find.
Decision Flowchart
Does the problem involve grouping/connectivity in an undirected graph? ├── NO → Not a Union-Find problem │ → Directed graph? Use DFS/Topological Sort │ → Shortest path? Use BFS/Dijkstra │ → Traversal needed? Use BFS/DFS └── YES → Are edges processed one at a time (online)? ├── YES → Union-Find is the best choice │ ├── Counting components? → Track uf.count │ ├── Detecting cycles? → Check if union returns false │ ├── Building MST? → Sort edges + Union-Find (Kruskal's) │ └── Grouping by equivalence? → Union elements sharing a property └── NO → Is the graph static (all edges known upfront)? ├── YES → BFS/DFS is simpler for one-time component finding │ (Union-Find also works but is overkill) └── NO → Union-Find for dynamic edge additions
Union-Find Operations Quick Reference
| Operation | Time | Description | Common Pitfall |
|---|---|---|---|
| find(x) | O(α(n)) | Return root (group leader) of x | Forgetting path compression |
| union(a, b) | O(α(n)) | Merge groups of a and b | Not calling find() on both first |
| connected(a, b) | O(α(n)) | Check if a and b are in same group | Comparing a and b directly instead of find(a) and find(b) |
| count | O(1) | Number of components | Decrementing on redundant unions |
| Constructor | O(n) | Initialize n singleton components | Wrong size for 1-indexed problems |
One last thing
Union-Find is one of those data structures that seems magical the first time you see it. Two simple optimizations — path compression and union by rank — turn a naive O(n) operation into effectively O(1). The class is always the same 25 lines of code. Memorize it, and the only challenge in each problem is figuring out what to union and when. That's the real skill: not implementing Union-Find, but recognizing that a problem is a Union-Find problem in disguise.