URL VersioningHeader VersioningCanaryBlue-GreenTraffic SplitDeprecationSunset

Versioning & Traffic Management

Managing API change without breaking clients — versioning strategies, canary deployments, blue-green, and graceful deprecation.

35 min read9 sections
01

Versioning Strategies

API versioning lets you evolve your API without breaking existing clients. The gateway is the natural place to handle version routing — mapping the client's requested version to the correct backend service or service version.

StrategyExampleProsCons
URL path/v1/users, /v2/usersExplicit, easy to route, cacheableURL pollution, hard to sunset
Custom headerX-API-Version: 2Clean URLs, flexibleHidden, easy to forget, not cacheable by default
Accept headerAccept: application/vnd.api.v2+jsonStandards-based (content negotiation)Complex, hard to test in browser
Query parameter/users?version=2Simple, visiblePollutes query string, caching issues
kong-version-routing.yamlyaml
# URL path versioninggateway routes to different services
services:
  - name: users-v1
    url: http://user-service-v1:8080
    routes:
      - name: users-v1-route
        paths:
          - /api/v1/users
        strip_path: true

  - name: users-v2
    url: http://user-service-v2:8080
    routes:
      - name: users-v2-route
        paths:
          - /api/v2/users
        strip_path: true

# Header-based versioningsingle path, route by header
  - name: users-v2-header
    url: http://user-service-v2:8080
    routes:
      - name: users-header-route
        paths:
          - /api/users
        headers:
          X-API-Version:
            - "2"
        strip_path: true

URL Path Versioning Wins in Practice

Despite debates, URL path versioning (/v1/, /v2/) is the most common choice for public APIs. It's explicit, easy to understand, works in browsers, is cacheable, and requires no special client configuration. Header-based versioning is cleaner architecturally but harder for developers to discover and test. Choose URL path unless you have a strong reason not to.

02

Traffic Splitting

Traffic splitting runs multiple versions simultaneously and distributes requests between them by percentage. This enables gradual migration from v1 to v2 without a hard cutover.

PatternDescriptionUse Case
Weighted routing90% → v1, 10% → v2Gradual migration to new version
Shadow traffic100% → v1, copy to v2 (response discarded)Test v2 with real traffic without risk
Header overrideDefault → v1, X-Use-V2: true → v2Internal testing, opt-in beta
User segmentFree users → v1, paid users → v2Feature gating by plan
envoy-traffic-split.yamlyaml
# Envoy weighted cluster routing
route_config:
  virtual_hosts:
    - name: api
      domains: ["api.example.com"]
      routes:
        - match:
            prefix: "/api/orders"
          route:
            weighted_clusters:
              clusters:
                - name: order-service-v1
                  weight: 90
                - name: order-service-v2
                  weight: 10
              total_weight: 100

# Shadow trafficmirror to v2 without affecting clients
        - match:
            prefix: "/api/products"
          route:
            cluster: product-service-v1
            request_mirror_policies:
              - cluster: product-service-v2
                runtime_fraction:
                  default_value:
                    numerator: 100  # Mirror 100% of traffic
                    denominator: HUNDRED
🚦

The Highway On-Ramp Meter

Traffic splitting is like a highway on-ramp meter that controls how many cars enter each lane. You gradually shift traffic from the old road (v1) to the new road (v2). If the new road has problems (errors, latency), you immediately redirect all traffic back to the old road. Shadow traffic is like sending a drone to fly the new route without any passengers — testing the path with zero risk.

03

Canary Deployments

A canary deployment routes a small percentage of traffic to the new version while monitoring for errors. If the canary is healthy, gradually increase traffic. If it fails, instantly route all traffic back to the stable version.

PhaseTraffic to CanaryDurationAction
1. Deploy0%Deploy new version alongside stable
2. Smoke test1%5 minutesVerify basic functionality
3. Expand5%15 minutesMonitor error rate and latency
4. Grow25%30 minutesCompare metrics against baseline
5. Majority50%1 hourConfidence check before full rollout
6. Complete100%Remove old version
Rollback0%InstantAny phase — route all traffic to stable
canary-deployment-config.yamlyaml
# Gateway canary configuration
routes:
  - name: order-api
    path: /api/v1/orders
    canary:
      enabled: true
      target: order-service-v2
      stable: order-service-v1
      weight: 5  # 5% to canary

      # Header overridedevelopers can force canary
      header_override:
        name: X-Canary
        value: "true"

      # Automatic rollback conditions
      rollback:
        error_rate_threshold: 5.0    # Rollback if > 5% errors
        latency_p99_threshold: 2000  # Rollback if p99 > 2 seconds
        evaluation_window: 300       # Evaluate over 5 minutes
        min_requests: 100            # Need 100 requests before evaluating

Header Override for Testing

Allow developers and QA to force-route to the canary via a header (X-Canary: true). This lets you test the new version with real infrastructure before opening it to any percentage of real users. The gateway checks for the header first — if present, route to canary regardless of weight percentage.

04

Blue-Green Deployments

Blue-green maintains two identical production environments. At any time, one (blue) serves all traffic while the other (green) is idle or running the new version. Deployment is an atomic switch at the gateway — flip all traffic from blue to green instantly.

AspectBlue-GreenCanary
Traffic switchAtomic — 0% or 100%Gradual — 1%, 5%, 25%, 50%, 100%
Rollback speedInstant — flip back to blueInstant — set canary weight to 0%
RiskHigher — all traffic hits new version at onceLower — only small % affected initially
Infrastructure cost2x — both environments always running1x + small canary pool
ComplexitySimple — binary switchMore complex — monitoring, auto-rollback
Best forConfident releases, database migrationsRisky changes, gradual validation
blue-green-gateway-config.yamlyaml
# Blue-green deployment at the gateway
upstreams:
  blue:
    targets:
      - blue-cluster-1:8080
      - blue-cluster-2:8080
      - blue-cluster-3:8080

  green:
    targets:
      - green-cluster-1:8080
      - green-cluster-2:8080
      - green-cluster-3:8080

routes:
  - name: all-traffic
    path: /api/
    upstream: blue  # Currently serving from blue

# Deployment process:
# 1. Deploy new version to green cluster
# 2. Run smoke tests against green directly
# 3. Switch: upstream: blueupstream: green
# 4. Monitor for 5 minutes
# 5. If issues: switch back to blue (instant rollback)
# 6. If stable: blue becomes the next deployment target

Database Migrations with Blue-Green

The hardest part of blue-green is database migrations. Both blue and green must work with the same database. Strategy: (1) Make schema changes backward-compatible (add columns, don't rename/remove). (2) Deploy schema change first (both versions work). (3) Switch traffic to green. (4) Remove old columns in a later release. This "expand and contract" pattern avoids breaking either version during the switch.

05

Deprecation Management

Deprecation is the process of phasing out an old API version. The gateway can enforce deprecation policies — warning clients, blocking after sunset dates, and tracking usage of deprecated endpoints.

deprecation-headers.txtbash
# RFC 8594Sunset header
# Gateway injects these headers on deprecated endpoints

HTTP/1.1 200 OK
Deprecation: true
Sunset: Sat, 01 Jun 2025 00:00:00 GMT
Link: <https://docs.example.com/migration/v1-to-v2>; rel="successor-version"
Link: <https://docs.example.com/deprecation-policy>; rel="deprecation"

# After sunset dategateway blocks with 410 Gone
HTTP/1.1 410 Gone
Content-Type: application/json

{
  "error": "API_VERSION_SUNSET",
  "message": "API v1 was sunset on 2025-06-01. Please migrate to v2.",
  "migration_guide": "https://docs.example.com/migration/v1-to-v2",
  "successor": "https://api.example.com/v2/"
}
PhaseGateway BehaviorDuration
ActiveNormal operation, no warningsIndefinite
DeprecatedInject Deprecation + Sunset headers3-6 months minimum
WarningReturn warning in response body + headersLast 30 days
SunsetReturn 410 Gone, block all requestsPermanent

Deprecation Best Practices

  • Announce deprecation at least 6 months before sunset for public APIs
  • Track usage of deprecated endpoints — contact active consumers directly
  • Provide a migration guide with the Deprecation header (Link rel)
  • Log which consumers still use deprecated endpoints for targeted outreach
  • Offer a sunset extension for enterprise customers who need more time
06

Request Validation for Versions

Different API versions may accept different request schemas. The gateway can validate requests against version-specific schemas, ensuring clients send the correct format for their requested version.

version-specific-schemas.jsonjson
{
  "v1": {
    "POST /users": {
      "type": "object",
      "required": ["name", "email"],
      "properties": {
        "name": {"type": "string"},
        "email": {"type": "string", "format": "email"}
      }
    }
  },
  "v2": {
    "POST /users": {
      "type": "object",
      "required": ["first_name", "last_name", "email"],
      "properties": {
        "first_name": {"type": "string", "minLength": 1},
        "last_name": {"type": "string", "minLength": 1},
        "email": {"type": "string", "format": "email"},
        "phone": {"type": "string", "pattern": "^\+[1-9]\d{1,14}$"}
      }
    }
  }
}

Backward Compatibility Rules

A new version should only add requirements, never remove capabilities that clients depend on. Safe changes: adding optional fields, adding new endpoints, adding new enum values. Breaking changes (requiring a new version): removing fields, renaming fields, changing field types, removing endpoints, making optional fields required.

Backward-Compatible Changes (No New Version Needed)

  • Adding a new optional field to request or response
  • Adding a new endpoint
  • Adding a new optional query parameter
  • Adding a new enum value (if clients handle unknown values)
  • Relaxing a validation constraint (shorter min length)

Breaking Changes (Require New Version)

  • Removing or renaming a response field
  • Changing a field's type (string → number)
  • Making an optional field required
  • Removing an endpoint or HTTP method
  • Changing error response format
07

A/B Testing at the Gateway

The gateway can route users to different API variants for A/B testing — measuring which version produces better outcomes. Unlike canary (testing for bugs), A/B testing measures business metrics (conversion, engagement).

AspectCanaryA/B Test
GoalDetect bugs/regressionsMeasure business impact
MetricsError rate, latencyConversion, revenue, engagement
DurationMinutes to hoursDays to weeks
AssignmentRandom per requestConsistent per user (sticky)
Rollback triggerTechnical failureStatistical insignificance
ab-test-gateway-config.yamlyaml
# A/B test configuration at the gateway
ab_tests:
  - name: checkout-flow-v2
    enabled: true
    routes:
      - /api/v1/checkout
      - /api/v1/cart

    # Consistent assignmentsame user always gets same variant
    assignment:
      method: hash        # Hash user ID for consistent assignment
      key: X-User-ID     # Header containing user identifier
      salt: "checkout-v2-2024"  # Change salt to re-randomize

    variants:
      - name: control
        weight: 50
        upstream: checkout-service-v1
      - name: treatment
        weight: 50
        upstream: checkout-service-v2

    # Gateway adds variant header for downstream tracking
    response_headers:
      X-AB-Test: checkout-flow-v2
      X-AB-Variant: "{{variant_name}}"

    # Segment targetingonly test on specific users
    targeting:
      include:
        - header: X-User-Plan
          values: ["pro", "enterprise"]
      exclude:
        - header: X-User-Country
          values: ["CN", "RU"]  # Exclude due to latency differences

Consistent Assignment is Critical

A/B tests must assign the same user to the same variant on every request. If a user sees variant A on one request and variant B on the next, the test results are meaningless. Hash the user ID (not IP — multiple users share IPs) to deterministically assign variants. Include a salt so you can re-randomize for new tests.

08

Interview Questions

Q:Compare URL path versioning vs header-based versioning. Which would you choose for a public API?

A: URL path (/v1/users): explicit, discoverable, cacheable, works in browsers, easy to document. Header-based (X-API-Version: 2): cleaner URLs, more flexible, but hidden from developers, not cacheable by default, harder to test. For public APIs, URL path wins — developers can see the version in documentation, test in browsers, and share URLs. Header-based is better for internal APIs where you control all clients and want cleaner URLs.

Q:How would you implement a canary deployment with automatic rollback?

A: (1) Deploy new version alongside stable. (2) Gateway routes 1-5% of traffic to canary. (3) Monitor error rate and p99 latency for canary vs stable. (4) If canary error rate > threshold (e.g., 5%) or latency > 2x stable, automatically set canary weight to 0% (rollback). (5) If healthy for evaluation window (5-15 min), increase weight. (6) Repeat until 100%. Key: the gateway needs real-time metrics comparison between canary and stable to make automated decisions.

Q:What's the difference between blue-green and canary deployments? When would you use each?

A: Blue-green: atomic switch (0% → 100%), two full environments, instant rollback by switching back. Canary: gradual shift (1% → 5% → 25% → 100%), one small canary pool, rollback by setting weight to 0%. Use blue-green when: you're confident in the release, need atomic database migrations, or want simplicity. Use canary when: the change is risky, you want to validate with real traffic gradually, or you can't afford 2x infrastructure cost.

Q:How do you handle API deprecation without breaking existing clients?

A: Phased approach: (1) Announce deprecation with timeline (6+ months for public APIs). (2) Gateway injects Deprecation: true and Sunset: <date> headers on every response. (3) Track which consumers still use deprecated endpoints — contact them directly. (4) In the last 30 days, add warning in response body. (5) After sunset date, return 410 Gone with migration guide link. (6) Keep deprecated version running (read-only) for 30 more days as emergency fallback. Never surprise clients with a sudden shutdown.

Q:How does A/B testing at the gateway differ from feature flags in application code?

A: Gateway A/B testing: routes entire requests to different service versions — tests different implementations, architectures, or algorithms. Feature flags: same service, different code paths — tests UI changes, copy, or minor logic differences. Gateway-level is better for: backend algorithm changes, new service architectures, performance comparisons. Feature flags are better for: UI experiments, gradual feature rollouts, kill switches. They're complementary — use both.

09

Common Mistakes

⚠️

No deprecation period before sunsetting

Removing API v1 without warning — breaking all clients that haven't migrated yet.

Minimum 6 months deprecation period for public APIs. Inject Deprecation and Sunset headers immediately. Track usage and contact active consumers. Never sunset without confirming zero (or negligible) traffic on the old version.

⚠️

Canary without proper metrics comparison

Running a canary deployment but only checking if it 'works' — not comparing error rates and latency against the stable version.

Compare canary metrics against the stable baseline: error rate, p50/p95/p99 latency, and business metrics. A 2% error rate on canary means nothing if stable also has 2%. Automated rollback should trigger on relative degradation, not absolute thresholds.

⚠️

Inconsistent user assignment in A/B tests

Assigning users to A/B variants randomly per request — the same user sees different variants on different requests, invalidating test results.

Hash the user ID (with a test-specific salt) to deterministically assign variants. The same user always gets the same variant for the duration of the test. Use a stable identifier (user ID, not session ID) that persists across sessions.

⚠️

Running too many API versions simultaneously

Supporting v1, v2, v3, v4, and v5 simultaneously — each with different schemas, behaviors, and bugs to maintain.

Support at most 2 versions concurrently (current + previous). Aggressively deprecate and sunset old versions. Each active version multiplies your maintenance burden, testing matrix, and bug surface. If clients need stability, offer long deprecation periods — not eternal version support.