Cache-ControlETagInvalidationConsulKubernetesDNSService Discovery

Caching & Service Discovery

The gateway as a caching layer and how it discovers upstream services — static config, Consul, Kubernetes, and DNS-based discovery.

35 min read9 sections
01

What to Cache at the Gateway

Gateway caching intercepts responses and serves them directly on subsequent identical requests — without hitting the upstream service. This reduces backend load, improves latency, and provides resilience (serve stale cache when backends are down).

Good Candidates for Gateway Caching

  • GET requests returning reference data (product catalog, configuration, feature flags)
  • Public API responses that are identical for all users (pricing pages, documentation)
  • Expensive aggregations that change infrequently (dashboards, reports)
  • Static-ish data with known TTL (exchange rates updated hourly, weather data)
  • Responses with explicit Cache-Control headers from upstream

What NOT to Cache

  • User-specific data without proper Vary handling (my orders, my profile)
  • Responses to POST/PUT/DELETE (mutations must reach the backend)
  • Real-time data (stock prices, live scores) unless stale-while-revalidate is acceptable
  • Responses with Set-Cookie headers (session-specific)
  • Large binary responses (better served by CDN)
📋

The Restaurant Specials Board

Gateway caching is like a restaurant's specials board. Instead of the waiter (gateway) asking the kitchen (backend) about today's specials for every customer, they write it on a board once and point customers to it. The board is updated when specials change (invalidation). But you'd never put 'your personal dietary requirements' on a shared board — that's per-user data that shouldn't be cached generically.

02

Cache Key Design

The cache key determines when two requests are considered "the same" and can share a cached response. A poorly designed key either serves wrong data (too broad) or never hits cache (too specific).

ComponentInclude?Why
HTTP MethodYesGET /users and DELETE /users are very different
URL PathYes/users/1 and /users/2 are different resources
Query ParametersYes (sorted)?page=1&size=10 and ?size=10&page=1 should match
Vary HeadersYesAccept-Language: en vs fr return different content
AuthorizationDependsInclude for user-scoped caching, exclude for public data
Request BodyNoCache is for GET requests — no body
cache-key-configuration.yamlyaml
# Kong proxy-cache plugin configuration
plugins:
  - name: proxy-cache
    config:
      strategy: redis
      redis:
        host: redis-cache
        port: 6379
      content_type:
        - application/json
      request_method:
        - GET
        - HEAD
      response_code:
        - 200
        - 301
      cache_ttl: 300  # 5 minutes default
      vary_headers:
        - Accept
        - Accept-Language
      vary_query_params:
        - page
        - size
        - sort

# Cache key formula:
# SHA256(method + path + sorted_query_params + vary_header_values)
# Example: SHA256("GET:/api/products:page=1&size=20:accept=application/json")

Query Parameter Normalization

Sort query parameters alphabetically before hashing. Without normalization,?a=1&b=2 and ?b=2&a=1 produce different cache keys for identical requests. Also normalize case and remove empty parameters. This simple step can dramatically improve cache hit rates.

03

Cache Control Headers

HTTP cache control headers tell the gateway (and any intermediate caches) how to handle response caching. The upstream service sets these headers to communicate caching intent.

Header/DirectiveMeaningExample
max-age=NCache for N seconds from response timeCache-Control: max-age=3600
s-maxage=NMax age for shared caches (gateway) — overrides max-ageCache-Control: s-maxage=600
no-storeNever cache this response anywhereCache-Control: no-store
no-cacheCache but revalidate before serving (misleading name)Cache-Control: no-cache
privateOnly browser can cache, not gateway/CDNCache-Control: private
publicAny cache (gateway, CDN) can store thisCache-Control: public, max-age=3600
stale-while-revalidate=NServe stale for N seconds while fetching freshCache-Control: max-age=60, stale-while-revalidate=300

Conditional Requests (ETag / Last-Modified)

conditional-request-flow.shbash
# First requestgateway caches response with ETag
GET /api/products/123
Response:
  200 OK
  ETag: "abc123hash"
  Cache-Control: max-age=60
  Last-Modified: Fri, 01 Mar 2024 12:00:00 GMT
  Body: {"id": 123, "name": "Widget", "price": 9.99}

# After max-age expiresgateway revalidates
GET /api/products/123
If-None-Match: "abc123hash"
If-Modified-Since: Fri, 01 Mar 2024 12:00:00 GMT

# If unchangedupstream returns 304 (no body transfer)
Response:
  304 Not Modified
  ETag: "abc123hash"

# Gateway serves cached bodysaves bandwidth and backend processing

stale-while-revalidate for Resilience

stale-while-revalidate is powerful for gateway caching. When a cached response expires, the gateway serves the stale version immediately (fast response) while fetching a fresh copy in the background. If the backend is down, the stale response is still served — providing resilience. Combine with stale-if-error for explicit fallback behavior.

04

Cache Invalidation

Cache invalidation is famously one of the two hard problems in computer science. The gateway needs strategies to remove stale data when the underlying resource changes.

StrategyHow It WorksTrade-off
TTL ExpiryCache expires after fixed timeSimple but serves stale data until TTL
Explicit PurgeAPI call to delete specific cache entryPrecise but requires purge infrastructure
Tag-based PurgeTag entries, purge all entries with a tagEfficient for related resources
Surrogate KeysBackend sends Surrogate-Key header, purge by keyPowerful but complex setup
Event-drivenListen to change events, invalidate affected entriesReal-time but requires event infrastructure
cache-invalidation-api.shbash
# Explicit purgedelete specific cached response
curl -X PURGE https://gateway.internal/cache/api/products/123

# Tag-based purgeinvalidate all product-related cache entries
curl -X POST https://gateway.internal/cache/purge   -H "Content-Type: application/json"   -d '{"tags": ["products", "catalog"]}'

# Surrogate key approachbackend includes keys in response
# Response from upstream:
#   Surrogate-Key: product-123 category-electronics all-products
#
# When product 123 changes:
curl -X POST https://gateway.internal/cache/purge   -d '{"surrogate_key": "product-123"}'
# Purges all cached responses tagged with "product-123"

The Pragmatic Approach

For most APIs, short TTLs (30–300 seconds) with stale-while-revalidate are sufficient. You accept slightly stale data in exchange for simplicity. Only invest in explicit invalidation when freshness is critical (inventory counts, pricing) or when cache hit rates are low due to short TTLs.

05

Static vs Dynamic Service Discovery

The gateway needs to know where upstream services live — their IP addresses and ports. Static configuration works for simple deployments but breaks down when services scale dynamically or move across hosts.

ApproachHow It WorksBest For
Static configHardcode upstream IPs in gateway configSmall, stable deployments
DNS-basedResolve service hostname, DNS returns current IPsSimple dynamic discovery
Registry-basedQuery service registry (Consul, Eureka) for endpointsMicroservices with frequent scaling
Kubernetes-nativeWatch Kubernetes Endpoints/Services APIKubernetes deployments
File-based watchRead upstream list from file, reload on changeGitOps with config management
static-vs-dynamic-config.yamlyaml
# Static configurationhardcoded upstreams
upstreams:
  - name: order-service
    targets:
      - target: 10.0.1.10:8080
        weight: 100
      - target: 10.0.1.11:8080
        weight: 100
      - target: 10.0.1.12:8080
        weight: 100
    # Problem: when order-service scales to 5 instances,
    # you must manually update this config and reload

---
# Dynamic configurationDNS-based
upstreams:
  - name: order-service
    host: order-service.internal  # DNS resolves to current IPs
    dns_resolver: 10.0.0.2
    dns_ttl: 30  # Re-resolve every 30 seconds
    # Automatically picks up new instances when DNS updates

Static Config is a Scaling Bottleneck

Static upstream configuration means every scale event (new instance, removed instance, IP change) requires a gateway config update and reload. In Kubernetes where pods are ephemeral and IPs change constantly, static config is unworkable. Dynamic discovery is not optional in container orchestration environments.

06

Dynamic Discovery

Dynamic service discovery allows the gateway to automatically detect new service instances, removed instances, and health changes without configuration updates.

RegistryModelGateway Integration
ConsulAgent-based, health-checked, KV storeDNS interface or HTTP API polling
KubernetesEndpoints API, label selectorsWatch API for real-time updates
EurekaClient-side registration, heartbeatHTTP API polling
etcdDistributed KV store, watch supportWatch keys for changes
DNS SRVStandard DNS with port + prioritySRV record resolution
kong-consul-discovery.yamlyaml
# Kong with Consul service discovery
upstreams:
  - name: order-service
    algorithm: round-robin
    healthchecks:
      active:
        http_path: /health
        healthy:
          interval: 5
          successes: 2
        unhealthy:
          interval: 5
          http_failures: 3

# Kong queries Consul for healthy instances:
# GET /v1/health/service/order-service?passing=true
#
# Returns:
# [
#   {"Service": {"Address": "10.0.1.10", "Port": 8080}},
#   {"Service": {"Address": "10.0.1.11", "Port": 8080}}
# ]
#
# Gateway updates upstream targets automatically
envoy-kubernetes-eds.yamlyaml
# Envoy with Kubernetes Endpoint Discovery Service (EDS)
clusters:
  - name: order-service
    type: EDS
    eds_cluster_config:
      eds_config:
        api_config_source:
          api_type: GRPC
          grpc_services:
            - envoy_grpc:
                cluster_name: xds-cluster
    # Envoy watches Kubernetes Endpoints via xDS API
    # New pods automatically added, terminated pods removed
    # Zero config changes needed for scaling events
07

Upstream Health Management

Health management determines which upstream instances can receive traffic. The gateway must detect unhealthy instances quickly and stop sending them requests — then detect recovery and resume traffic.

TypeHow It WorksProsCons
Active health checkGateway periodically pings /health endpointDetects failures before clients hit themExtra network traffic, false positives possible
Passive health checkMonitor response codes from real trafficNo extra traffic, based on real behaviorRequires traffic to detect failure
Outlier detectionStatistical analysis of error rates per instanceCatches degraded (not dead) instancesNeeds enough traffic for statistical significance
health-check-config.yamlyaml
# Combined active + passive health checking
upstreams:
  - name: payment-service
    healthchecks:
      active:
        type: http
        http_path: /health
        healthy:
          interval: 5          # Check every 5 seconds
          successes: 2         # Mark healthy after 2 consecutive successes
          http_statuses: [200]
        unhealthy:
          interval: 2          # Check more frequently when unhealthy
          http_failures: 3     # Mark unhealthy after 3 consecutive failures
          tcp_failures: 2
          timeouts: 3
          http_statuses: [500, 502, 503]
      passive:
        healthy:
          successes: 5         # Mark healthy after 5 successful real requests
          http_statuses: [200, 201, 204]
        unhealthy:
          http_failures: 5     # Mark unhealthy after 5 failed real requests
          tcp_failures: 2
          timeouts: 3
          http_statuses: [500, 502, 503]

Graceful Shutdown Integration

When a service instance is shutting down (deployment, scaling down), it should: (1) Deregister from service registry. (2) Start failing health checks (return 503). (3) Finish in-flight requests. (4) Shut down. The gateway detects the health check failure and stops sending new requests, while existing requests complete normally. This is zero-downtime deployment.

08

Interview Questions

Q:How would you design cache invalidation for a product catalog API?

A: Layered approach: (1) Set Cache-Control: s-maxage=300, stale-while-revalidate=3600 — gateway caches for 5 min, serves stale for up to 1 hour while revalidating. (2) When a product is updated, publish an event to a message queue. (3) A cache invalidation consumer listens for product-change events and calls the gateway's purge API with the product's surrogate key. (4) For bulk updates (price changes), use tag-based purge. This gives you near-real-time freshness for changes while serving most reads from cache.

Q:What's the difference between active and passive health checks?

A: Active: the gateway periodically sends synthetic requests to /health endpoints — detects failures proactively before real traffic is affected. Passive: the gateway monitors responses from real traffic — detects failures reactively after some requests have already failed. Use both: active for fast detection (catches a dead service in seconds), passive for catching intermittent issues that health endpoints don't reveal (memory leaks causing slow responses).

Q:How does service discovery work in Kubernetes with an API Gateway?

A: The gateway watches the Kubernetes Endpoints API (or uses xDS protocol). When a new pod becomes Ready, Kubernetes updates the Endpoints object. The gateway detects this change and adds the pod to its upstream pool. When a pod is terminated, it's removed from Endpoints and the gateway stops routing to it. This is automatic — no manual config needed. The gateway can also use Kubernetes Services (ClusterIP) and let kube-proxy handle load balancing, but direct endpoint watching gives the gateway more control over load balancing algorithms.

Q:When should you cache at the gateway vs at the CDN vs at the service?

A: CDN: static assets, public content served globally (images, JS, CSS, public API docs). Gateway: API responses that are shared across users or change infrequently (product catalog, config). Service: user-specific data that requires business logic to determine freshness (user profile, order history — cached in Redis/Memcached). The layers complement each other: CDN handles edge caching, gateway handles API-level caching, service handles domain-specific caching.

Q:How do you handle cache key design for authenticated APIs?

A: For public data (same response regardless of user): key = method + path + sorted_query_params. For user-scoped data: include user_id or tenant_id in the key (from the Vary header or explicit config). For role-based responses (admin sees more fields): include role in the key. Critical rule: never serve one user's cached response to another user. Use the Vary header to signal which request attributes affect the response.

09

Common Mistakes

⚠️

Caching responses with Set-Cookie headers

The gateway caches a response that includes Set-Cookie, then serves that cookie to other users — leaking sessions.

Never cache responses with Set-Cookie headers. Configure the cache to skip responses containing session-related headers. If the upstream accidentally sends Set-Cookie on cacheable responses, strip it or bypass cache for that response.

⚠️

Not normalizing cache keys

Treating ?page=1&size=10 and ?size=10&page=1 as different cache entries — halving your hit rate.

Sort query parameters alphabetically, normalize case, and remove empty/default parameters before computing the cache key. This ensures semantically identical requests share a cache entry.

⚠️

Static upstream config in dynamic environments

Hardcoding upstream IPs in gateway config when running in Kubernetes where pod IPs change on every deployment.

Use dynamic service discovery: Kubernetes Endpoints API, Consul, or DNS-based resolution. The gateway should automatically detect new instances and remove terminated ones without config changes or restarts.

⚠️

No health checks on upstreams

The gateway load-balances across all configured upstreams without checking if they're actually healthy — sending traffic to dead instances.

Enable both active health checks (periodic /health pings) and passive health checks (monitor real response codes). Mark instances as unhealthy after consecutive failures and stop routing to them. Check more frequently when unhealthy to detect recovery quickly.