How does A/B testing work?
The Short Answer
A/B testing (also called split testing) is a method of comparing two or more versions of a page, feature, or UI element to determine which performs better. You randomly split users into groups, show each group a different variant, measure a specific metric (conversion rate, click-through rate, engagement), and use statistical analysis to determine which variant wins. It removes guesswork from product decisions by letting real user behavior decide.
How It Works
The core process follows a scientific method: form a hypothesis, design an experiment, collect data, and draw conclusions. Here's the typical flow from start to finish.
- Define the hypothesis
- "Changing the CTA button from blue to green will increase sign-ups by 10%"
- Create variants
- Control (A): existing blue button
- Treatment (B): new green button
- Split traffic randomly
- 50% of users see variant A, 50% see variant B
- Assignment must be consistent — same user always sees the same variant
- Collect data
- Track the target metric (sign-up rate) for both groups
- Analyze results
- Use statistical significance testing (p-value < 0.05)
- Ensure sample size is large enough to be meaningful
- Make a decision
- If B wins with statistical significance → ship it
- If no significant difference → keep A (simpler)
Frontend Implementation
On the frontend, A/B testing typically involves assigning users to a variant (usually via a cookie or user ID hash), conditionally rendering the appropriate UI, and tracking events. Here's a simplified implementation showing the core pattern.
type Variant = 'control' | 'treatment';
function getVariant(experimentId: string, userId: string): Variant {
// Deterministic assignment — same user always gets same variant
// Hash the combination of experiment + user to get a stable 0-1 value
const hash = simpleHash(`${experimentId}:${userId}`);
const normalized = hash / MAX_HASH_VALUE; // 0 to 1
// 50/50 split
return normalized < 0.5 ? 'control' : 'treatment';
}
function useExperiment(experimentId: string): Variant {
const userId = useUserId(); // From auth context or anonymous ID
const variant = useMemo(
() => getVariant(experimentId, userId),
[experimentId, userId]
);
// Track exposure (user saw this variant)
useEffect(() => {
trackEvent('experiment_exposure', {
experimentId,
variant,
userId,
});
}, [experimentId, variant, userId]);
return variant;
}
The key requirement is deterministic assignment — the same user must always see the same variant across page loads and sessions. This is typically achieved by hashing the user ID with the experiment ID, giving a stable result without needing to store the assignment.
function SignUpButton() {
const variant = useExperiment('signup-button-color-2024');
return (
<Button
variant={variant === 'treatment' ? 'default' : 'secondary'}
onClick={() => {
trackEvent('signup_clicked', { variant });
handleSignUp();
}}
>
Sign Up Free
</Button>
);
}
Statistical Significance
You can't just look at raw numbers and declare a winner. If variant B has a 5.2% conversion rate vs A's 5.0%, that might be random noise. Statistical significance testing tells you whether the observed difference is likely real or just chance. The standard threshold is a p-value below 0.05 (95% confidence).
Common statistical mistakes
- ❌Ending the test too early ("peeking" at results before reaching sample size)
- ❌Running too many variants without adjusting significance thresholds
- ❌Ignoring segment differences (a variant might win overall but lose for mobile users)
- ❌Not accounting for novelty effect (users click new things just because they're new)
Best Practices
Do
- ✅Test one variable at a time to isolate what caused the change
- ✅Calculate required sample size before starting the test
- ✅Run the test for at least one full business cycle (usually 1-2 weeks)
- ✅Use consistent user assignment (hash-based, not random per request)
- ✅Track both primary metrics and guardrail metrics (make sure you're not hurting something else)
Avoid
- ❌Testing too many things at once (can't attribute results)
- ❌Stopping tests early when you see a positive trend
- ❌Ignoring the losing variant's data (it teaches you what doesn't work)
- ❌Running tests with too little traffic (results won't be significant)
Why Interviewers Ask This
This question tests whether you understand data-driven product development. Interviewers want to see that you know how to design an experiment, understand the importance of statistical rigor, can implement feature flags and variant assignment on the frontend, and appreciate the pitfalls (peeking, insufficient sample size, novelty effects). It shows you think beyond just building features — you think about measuring their impact.
Quick Revision Cheat Sheet
What it is: Comparing variants with real users to measure which performs better
Assignment: Deterministic hash of userId + experimentId for consistent experience
Significance: p-value < 0.05 (95% confidence the result isn't random chance)
Duration: At least 1-2 weeks / one full business cycle
Key rule: Test one variable at a time to isolate causation
Peeking problem: Checking results early inflates false positive rate