Versioning & Traffic Management
Managing API change without breaking clients — versioning strategies, canary deployments, blue-green, and graceful deprecation.
Table of Contents
Versioning Strategies
API versioning lets you evolve your API without breaking existing clients. The gateway is the natural place to handle version routing — mapping the client's requested version to the correct backend service or service version.
| Strategy | Example | Pros | Cons |
|---|---|---|---|
| URL path | /v1/users, /v2/users | Explicit, easy to route, cacheable | URL pollution, hard to sunset |
| Custom header | X-API-Version: 2 | Clean URLs, flexible | Hidden, easy to forget, not cacheable by default |
| Accept header | Accept: application/vnd.api.v2+json | Standards-based (content negotiation) | Complex, hard to test in browser |
| Query parameter | /users?version=2 | Simple, visible | Pollutes query string, caching issues |
# URL path versioning — gateway routes to different services services: - name: users-v1 url: http://user-service-v1:8080 routes: - name: users-v1-route paths: - /api/v1/users strip_path: true - name: users-v2 url: http://user-service-v2:8080 routes: - name: users-v2-route paths: - /api/v2/users strip_path: true # Header-based versioning — single path, route by header - name: users-v2-header url: http://user-service-v2:8080 routes: - name: users-header-route paths: - /api/users headers: X-API-Version: - "2" strip_path: true
URL Path Versioning Wins in Practice
Despite debates, URL path versioning (/v1/, /v2/) is the most common choice for public APIs. It's explicit, easy to understand, works in browsers, is cacheable, and requires no special client configuration. Header-based versioning is cleaner architecturally but harder for developers to discover and test. Choose URL path unless you have a strong reason not to.
Traffic Splitting
Traffic splitting runs multiple versions simultaneously and distributes requests between them by percentage. This enables gradual migration from v1 to v2 without a hard cutover.
| Pattern | Description | Use Case |
|---|---|---|
| Weighted routing | 90% → v1, 10% → v2 | Gradual migration to new version |
| Shadow traffic | 100% → v1, copy to v2 (response discarded) | Test v2 with real traffic without risk |
| Header override | Default → v1, X-Use-V2: true → v2 | Internal testing, opt-in beta |
| User segment | Free users → v1, paid users → v2 | Feature gating by plan |
# Envoy weighted cluster routing route_config: virtual_hosts: - name: api domains: ["api.example.com"] routes: - match: prefix: "/api/orders" route: weighted_clusters: clusters: - name: order-service-v1 weight: 90 - name: order-service-v2 weight: 10 total_weight: 100 # Shadow traffic — mirror to v2 without affecting clients - match: prefix: "/api/products" route: cluster: product-service-v1 request_mirror_policies: - cluster: product-service-v2 runtime_fraction: default_value: numerator: 100 # Mirror 100% of traffic denominator: HUNDRED
The Highway On-Ramp Meter
Traffic splitting is like a highway on-ramp meter that controls how many cars enter each lane. You gradually shift traffic from the old road (v1) to the new road (v2). If the new road has problems (errors, latency), you immediately redirect all traffic back to the old road. Shadow traffic is like sending a drone to fly the new route without any passengers — testing the path with zero risk.
Canary Deployments
A canary deployment routes a small percentage of traffic to the new version while monitoring for errors. If the canary is healthy, gradually increase traffic. If it fails, instantly route all traffic back to the stable version.
| Phase | Traffic to Canary | Duration | Action |
|---|---|---|---|
| 1. Deploy | 0% | — | Deploy new version alongside stable |
| 2. Smoke test | 1% | 5 minutes | Verify basic functionality |
| 3. Expand | 5% | 15 minutes | Monitor error rate and latency |
| 4. Grow | 25% | 30 minutes | Compare metrics against baseline |
| 5. Majority | 50% | 1 hour | Confidence check before full rollout |
| 6. Complete | 100% | — | Remove old version |
| Rollback | 0% | Instant | Any phase — route all traffic to stable |
# Gateway canary configuration routes: - name: order-api path: /api/v1/orders canary: enabled: true target: order-service-v2 stable: order-service-v1 weight: 5 # 5% to canary # Header override — developers can force canary header_override: name: X-Canary value: "true" # Automatic rollback conditions rollback: error_rate_threshold: 5.0 # Rollback if > 5% errors latency_p99_threshold: 2000 # Rollback if p99 > 2 seconds evaluation_window: 300 # Evaluate over 5 minutes min_requests: 100 # Need 100 requests before evaluating
Header Override for Testing
Allow developers and QA to force-route to the canary via a header (X-Canary: true). This lets you test the new version with real infrastructure before opening it to any percentage of real users. The gateway checks for the header first — if present, route to canary regardless of weight percentage.
Blue-Green Deployments
Blue-green maintains two identical production environments. At any time, one (blue) serves all traffic while the other (green) is idle or running the new version. Deployment is an atomic switch at the gateway — flip all traffic from blue to green instantly.
| Aspect | Blue-Green | Canary |
|---|---|---|
| Traffic switch | Atomic — 0% or 100% | Gradual — 1%, 5%, 25%, 50%, 100% |
| Rollback speed | Instant — flip back to blue | Instant — set canary weight to 0% |
| Risk | Higher — all traffic hits new version at once | Lower — only small % affected initially |
| Infrastructure cost | 2x — both environments always running | 1x + small canary pool |
| Complexity | Simple — binary switch | More complex — monitoring, auto-rollback |
| Best for | Confident releases, database migrations | Risky changes, gradual validation |
# Blue-green deployment at the gateway upstreams: blue: targets: - blue-cluster-1:8080 - blue-cluster-2:8080 - blue-cluster-3:8080 green: targets: - green-cluster-1:8080 - green-cluster-2:8080 - green-cluster-3:8080 routes: - name: all-traffic path: /api/ upstream: blue # Currently serving from blue # Deployment process: # 1. Deploy new version to green cluster # 2. Run smoke tests against green directly # 3. Switch: upstream: blue → upstream: green # 4. Monitor for 5 minutes # 5. If issues: switch back to blue (instant rollback) # 6. If stable: blue becomes the next deployment target
Database Migrations with Blue-Green
The hardest part of blue-green is database migrations. Both blue and green must work with the same database. Strategy: (1) Make schema changes backward-compatible (add columns, don't rename/remove). (2) Deploy schema change first (both versions work). (3) Switch traffic to green. (4) Remove old columns in a later release. This "expand and contract" pattern avoids breaking either version during the switch.
Deprecation Management
Deprecation is the process of phasing out an old API version. The gateway can enforce deprecation policies — warning clients, blocking after sunset dates, and tracking usage of deprecated endpoints.
# RFC 8594 — Sunset header # Gateway injects these headers on deprecated endpoints HTTP/1.1 200 OK Deprecation: true Sunset: Sat, 01 Jun 2025 00:00:00 GMT Link: <https://docs.example.com/migration/v1-to-v2>; rel="successor-version" Link: <https://docs.example.com/deprecation-policy>; rel="deprecation" # After sunset date — gateway blocks with 410 Gone HTTP/1.1 410 Gone Content-Type: application/json { "error": "API_VERSION_SUNSET", "message": "API v1 was sunset on 2025-06-01. Please migrate to v2.", "migration_guide": "https://docs.example.com/migration/v1-to-v2", "successor": "https://api.example.com/v2/" }
| Phase | Gateway Behavior | Duration |
|---|---|---|
| Active | Normal operation, no warnings | Indefinite |
| Deprecated | Inject Deprecation + Sunset headers | 3-6 months minimum |
| Warning | Return warning in response body + headers | Last 30 days |
| Sunset | Return 410 Gone, block all requests | Permanent |
Deprecation Best Practices
- ✅Announce deprecation at least 6 months before sunset for public APIs
- ✅Track usage of deprecated endpoints — contact active consumers directly
- ✅Provide a migration guide with the Deprecation header (Link rel)
- ✅Log which consumers still use deprecated endpoints for targeted outreach
- ✅Offer a sunset extension for enterprise customers who need more time
Request Validation for Versions
Different API versions may accept different request schemas. The gateway can validate requests against version-specific schemas, ensuring clients send the correct format for their requested version.
{ "v1": { "POST /users": { "type": "object", "required": ["name", "email"], "properties": { "name": {"type": "string"}, "email": {"type": "string", "format": "email"} } } }, "v2": { "POST /users": { "type": "object", "required": ["first_name", "last_name", "email"], "properties": { "first_name": {"type": "string", "minLength": 1}, "last_name": {"type": "string", "minLength": 1}, "email": {"type": "string", "format": "email"}, "phone": {"type": "string", "pattern": "^\+[1-9]\d{1,14}$"} } } } }
Backward Compatibility Rules
A new version should only add requirements, never remove capabilities that clients depend on. Safe changes: adding optional fields, adding new endpoints, adding new enum values. Breaking changes (requiring a new version): removing fields, renaming fields, changing field types, removing endpoints, making optional fields required.
Backward-Compatible Changes (No New Version Needed)
- ✅Adding a new optional field to request or response
- ✅Adding a new endpoint
- ✅Adding a new optional query parameter
- ✅Adding a new enum value (if clients handle unknown values)
- ✅Relaxing a validation constraint (shorter min length)
Breaking Changes (Require New Version)
- ❌Removing or renaming a response field
- ❌Changing a field's type (string → number)
- ❌Making an optional field required
- ❌Removing an endpoint or HTTP method
- ❌Changing error response format
A/B Testing at the Gateway
The gateway can route users to different API variants for A/B testing — measuring which version produces better outcomes. Unlike canary (testing for bugs), A/B testing measures business metrics (conversion, engagement).
| Aspect | Canary | A/B Test |
|---|---|---|
| Goal | Detect bugs/regressions | Measure business impact |
| Metrics | Error rate, latency | Conversion, revenue, engagement |
| Duration | Minutes to hours | Days to weeks |
| Assignment | Random per request | Consistent per user (sticky) |
| Rollback trigger | Technical failure | Statistical insignificance |
# A/B test configuration at the gateway ab_tests: - name: checkout-flow-v2 enabled: true routes: - /api/v1/checkout - /api/v1/cart # Consistent assignment — same user always gets same variant assignment: method: hash # Hash user ID for consistent assignment key: X-User-ID # Header containing user identifier salt: "checkout-v2-2024" # Change salt to re-randomize variants: - name: control weight: 50 upstream: checkout-service-v1 - name: treatment weight: 50 upstream: checkout-service-v2 # Gateway adds variant header for downstream tracking response_headers: X-AB-Test: checkout-flow-v2 X-AB-Variant: "{{variant_name}}" # Segment targeting — only test on specific users targeting: include: - header: X-User-Plan values: ["pro", "enterprise"] exclude: - header: X-User-Country values: ["CN", "RU"] # Exclude due to latency differences
Consistent Assignment is Critical
A/B tests must assign the same user to the same variant on every request. If a user sees variant A on one request and variant B on the next, the test results are meaningless. Hash the user ID (not IP — multiple users share IPs) to deterministically assign variants. Include a salt so you can re-randomize for new tests.
Interview Questions
Q:Compare URL path versioning vs header-based versioning. Which would you choose for a public API?
A: URL path (/v1/users): explicit, discoverable, cacheable, works in browsers, easy to document. Header-based (X-API-Version: 2): cleaner URLs, more flexible, but hidden from developers, not cacheable by default, harder to test. For public APIs, URL path wins — developers can see the version in documentation, test in browsers, and share URLs. Header-based is better for internal APIs where you control all clients and want cleaner URLs.
Q:How would you implement a canary deployment with automatic rollback?
A: (1) Deploy new version alongside stable. (2) Gateway routes 1-5% of traffic to canary. (3) Monitor error rate and p99 latency for canary vs stable. (4) If canary error rate > threshold (e.g., 5%) or latency > 2x stable, automatically set canary weight to 0% (rollback). (5) If healthy for evaluation window (5-15 min), increase weight. (6) Repeat until 100%. Key: the gateway needs real-time metrics comparison between canary and stable to make automated decisions.
Q:What's the difference between blue-green and canary deployments? When would you use each?
A: Blue-green: atomic switch (0% → 100%), two full environments, instant rollback by switching back. Canary: gradual shift (1% → 5% → 25% → 100%), one small canary pool, rollback by setting weight to 0%. Use blue-green when: you're confident in the release, need atomic database migrations, or want simplicity. Use canary when: the change is risky, you want to validate with real traffic gradually, or you can't afford 2x infrastructure cost.
Q:How do you handle API deprecation without breaking existing clients?
A: Phased approach: (1) Announce deprecation with timeline (6+ months for public APIs). (2) Gateway injects Deprecation: true and Sunset: <date> headers on every response. (3) Track which consumers still use deprecated endpoints — contact them directly. (4) In the last 30 days, add warning in response body. (5) After sunset date, return 410 Gone with migration guide link. (6) Keep deprecated version running (read-only) for 30 more days as emergency fallback. Never surprise clients with a sudden shutdown.
Q:How does A/B testing at the gateway differ from feature flags in application code?
A: Gateway A/B testing: routes entire requests to different service versions — tests different implementations, architectures, or algorithms. Feature flags: same service, different code paths — tests UI changes, copy, or minor logic differences. Gateway-level is better for: backend algorithm changes, new service architectures, performance comparisons. Feature flags are better for: UI experiments, gradual feature rollouts, kill switches. They're complementary — use both.
Common Mistakes
No deprecation period before sunsetting
Removing API v1 without warning — breaking all clients that haven't migrated yet.
✅Minimum 6 months deprecation period for public APIs. Inject Deprecation and Sunset headers immediately. Track usage and contact active consumers. Never sunset without confirming zero (or negligible) traffic on the old version.
Canary without proper metrics comparison
Running a canary deployment but only checking if it 'works' — not comparing error rates and latency against the stable version.
✅Compare canary metrics against the stable baseline: error rate, p50/p95/p99 latency, and business metrics. A 2% error rate on canary means nothing if stable also has 2%. Automated rollback should trigger on relative degradation, not absolute thresholds.
Inconsistent user assignment in A/B tests
Assigning users to A/B variants randomly per request — the same user sees different variants on different requests, invalidating test results.
✅Hash the user ID (with a test-specific salt) to deterministically assign variants. The same user always gets the same variant for the duration of the test. Use a stable identifier (user ID, not session ID) that persists across sessions.
Running too many API versions simultaneously
Supporting v1, v2, v3, v4, and v5 simultaneously — each with different schemas, behaviors, and bugs to maintain.
✅Support at most 2 versions concurrently (current + previous). Aggressively deprecate and sunset old versions. Each active version multiplies your maintenance burden, testing matrix, and bug surface. If clients need stability, offer long deprecation periods — not eternal version support.