Cache-ControlETagInvalidationConsulKubernetesDNSService Discovery

Caching & Service Discovery

The gateway as a caching layer and how it discovers upstream services — static config, Consul, Kubernetes, and DNS-based discovery.

35 min read9 sections

What to Cache at the Gateway

Gateway caching intercepts responses and serves them directly on subsequent identical requests — without hitting the upstream service. This reduces backend load, improves latency, and provides resilience (serve stale cache when backends are down).

Good Candidates for Gateway Caching

✅GET requests returning reference data (product catalog, configuration, feature flags)
✅Public API responses that are identical for all users (pricing pages, documentation)
✅Expensive aggregations that change infrequently (dashboards, reports)
✅Static-ish data with known TTL (exchange rates updated hourly, weather data)
✅Responses with explicit Cache-Control headers from upstream

What NOT to Cache

❌User-specific data without proper Vary handling (my orders, my profile)
❌Responses to POST/PUT/DELETE (mutations must reach the backend)
❌Real-time data (stock prices, live scores) unless stale-while-revalidate is acceptable
❌Responses with Set-Cookie headers (session-specific)
❌Large binary responses (better served by CDN)

📋

The Restaurant Specials Board

Gateway caching is like a restaurant's specials board. Instead of the waiter (gateway) asking the kitchen (backend) about today's specials for every customer, they write it on a board once and point customers to it. The board is updated when specials change (invalidation). But you'd never put 'your personal dietary requirements' on a shared board — that's per-user data that shouldn't be cached generically.

Cache Key Design

The cache key determines when two requests are considered "the same" and can share a cached response. A poorly designed key either serves wrong data (too broad) or never hits cache (too specific).

Component	Include?	Why
HTTP Method	Yes	GET /users and DELETE /users are very different
URL Path	Yes	/users/1 and /users/2 are different resources
Query Parameters	Yes (sorted)	?page=1&size=10 and ?size=10&page=1 should match
Vary Headers	Yes	Accept-Language: en vs fr return different content
Authorization	Depends	Include for user-scoped caching, exclude for public data
Request Body	No	Cache is for GET requests — no body

cache-key-configuration.yamlyaml

# Kong proxy-cache plugin configuration
plugins:
  - name: proxy-cache
    config:
      strategy: redis
      redis:
        host: redis-cache
        port: 6379
      content_type:
        - application/json
      request_method:
        - GET
        - HEAD
      response_code:
        - 200
        - 301
      cache_ttl: 300  # 5 minutes default
      vary_headers:
        - Accept
        - Accept-Language
      vary_query_params:
        - page
        - size
        - sort

# Cache key formula:
# SHA256(method + path + sorted_query_params + vary_header_values)
# Example: SHA256("GET:/api/products:page=1&size=20:accept=application/json")

Query Parameter Normalization

Sort query parameters alphabetically before hashing. Without normalization,?a=1&b=2 and ?b=2&a=1 produce different cache keys for identical requests. Also normalize case and remove empty parameters. This simple step can dramatically improve cache hit rates.

Cache Control Headers

HTTP cache control headers tell the gateway (and any intermediate caches) how to handle response caching. The upstream service sets these headers to communicate caching intent.

Header/Directive	Meaning	Example
max-age=N	Cache for N seconds from response time	Cache-Control: max-age=3600
s-maxage=N	Max age for shared caches (gateway) — overrides max-age	Cache-Control: s-maxage=600
no-store	Never cache this response anywhere	Cache-Control: no-store
no-cache	Cache but revalidate before serving (misleading name)	Cache-Control: no-cache
private	Only browser can cache, not gateway/CDN	Cache-Control: private
public	Any cache (gateway, CDN) can store this	Cache-Control: public, max-age=3600
stale-while-revalidate=N	Serve stale for N seconds while fetching fresh	Cache-Control: max-age=60, stale-while-revalidate=300

Conditional Requests (ETag / Last-Modified)

conditional-request-flow.shbash

# First request — gateway caches response with ETag
GET /api/products/123
Response:
  200 OK
  ETag: "abc123hash"
  Cache-Control: max-age=60
  Last-Modified: Fri, 01 Mar 2024 12:00:00 GMT
  Body: {"id": 123, "name": "Widget", "price": 9.99}

# After max-age expires — gateway revalidates
GET /api/products/123
If-None-Match: "abc123hash"
If-Modified-Since: Fri, 01 Mar 2024 12:00:00 GMT

# If unchanged — upstream returns 304 (no body transfer)
Response:
  304 Not Modified
  ETag: "abc123hash"

# Gateway serves cached body — saves bandwidth and backend processing

stale-while-revalidate for Resilience

stale-while-revalidate is powerful for gateway caching. When a cached response expires, the gateway serves the stale version immediately (fast response) while fetching a fresh copy in the background. If the backend is down, the stale response is still served — providing resilience. Combine with stale-if-error for explicit fallback behavior.

Cache Invalidation

Cache invalidation is famously one of the two hard problems in computer science. The gateway needs strategies to remove stale data when the underlying resource changes.

Strategy	How It Works	Trade-off
TTL Expiry	Cache expires after fixed time	Simple but serves stale data until TTL
Explicit Purge	API call to delete specific cache entry	Precise but requires purge infrastructure
Tag-based Purge	Tag entries, purge all entries with a tag	Efficient for related resources
Surrogate Keys	Backend sends Surrogate-Key header, purge by key	Powerful but complex setup
Event-driven	Listen to change events, invalidate affected entries	Real-time but requires event infrastructure

cache-invalidation-api.shbash

# Explicit purge — delete specific cached response
curl -X PURGE https://gateway.internal/cache/api/products/123

# Tag-based purge — invalidate all product-related cache entries
curl -X POST https://gateway.internal/cache/purge   -H "Content-Type: application/json"   -d '{"tags": ["products", "catalog"]}'

# Surrogate key approach — backend includes keys in response
# Response from upstream:
#   Surrogate-Key: product-123 category-electronics all-products
#
# When product 123 changes:
curl -X POST https://gateway.internal/cache/purge   -d '{"surrogate_key": "product-123"}'
# Purges all cached responses tagged with "product-123"

The Pragmatic Approach

For most APIs, short TTLs (30–300 seconds) with stale-while-revalidate are sufficient. You accept slightly stale data in exchange for simplicity. Only invest in explicit invalidation when freshness is critical (inventory counts, pricing) or when cache hit rates are low due to short TTLs.

Static vs Dynamic Service Discovery

The gateway needs to know where upstream services live — their IP addresses and ports. Static configuration works for simple deployments but breaks down when services scale dynamically or move across hosts.

Approach	How It Works	Best For
Static config	Hardcode upstream IPs in gateway config	Small, stable deployments
DNS-based	Resolve service hostname, DNS returns current IPs	Simple dynamic discovery
Registry-based	Query service registry (Consul, Eureka) for endpoints	Microservices with frequent scaling
Kubernetes-native	Watch Kubernetes Endpoints/Services API	Kubernetes deployments
File-based watch	Read upstream list from file, reload on change	GitOps with config management

static-vs-dynamic-config.yamlyaml

# Static configuration — hardcoded upstreams
upstreams:
  - name: order-service
    targets:
      - target: 10.0.1.10:8080
        weight: 100
      - target: 10.0.1.11:8080
        weight: 100
      - target: 10.0.1.12:8080
        weight: 100
    # Problem: when order-service scales to 5 instances,
    # you must manually update this config and reload

---
# Dynamic configuration — DNS-based
upstreams:
  - name: order-service
    host: order-service.internal  # DNS resolves to current IPs
    dns_resolver: 10.0.0.2
    dns_ttl: 30  # Re-resolve every 30 seconds
    # Automatically picks up new instances when DNS updates

Static Config is a Scaling Bottleneck

Static upstream configuration means every scale event (new instance, removed instance, IP change) requires a gateway config update and reload. In Kubernetes where pods are ephemeral and IPs change constantly, static config is unworkable. Dynamic discovery is not optional in container orchestration environments.

Dynamic Discovery

Dynamic service discovery allows the gateway to automatically detect new service instances, removed instances, and health changes without configuration updates.

Registry	Model	Gateway Integration
Consul	Agent-based, health-checked, KV store	DNS interface or HTTP API polling
Kubernetes	Endpoints API, label selectors	Watch API for real-time updates
Eureka	Client-side registration, heartbeat	HTTP API polling
etcd	Distributed KV store, watch support	Watch keys for changes
DNS SRV	Standard DNS with port + priority	SRV record resolution

kong-consul-discovery.yamlyaml

# Kong with Consul service discovery
upstreams:
  - name: order-service
    algorithm: round-robin
    healthchecks:
      active:
        http_path: /health
        healthy:
          interval: 5
          successes: 2
        unhealthy:
          interval: 5
          http_failures: 3

# Kong queries Consul for healthy instances:
# GET /v1/health/service/order-service?passing=true
#
# Returns:
# [
#   {"Service": {"Address": "10.0.1.10", "Port": 8080}},
#   {"Service": {"Address": "10.0.1.11", "Port": 8080}}
# ]
#
# Gateway updates upstream targets automatically

envoy-kubernetes-eds.yamlyaml

# Envoy with Kubernetes Endpoint Discovery Service (EDS)
clusters:
  - name: order-service
    type: EDS
    eds_cluster_config:
      eds_config:
        api_config_source:
          api_type: GRPC
          grpc_services:
            - envoy_grpc:
                cluster_name: xds-cluster
    # Envoy watches Kubernetes Endpoints via xDS API
    # New pods automatically added, terminated pods removed
    # Zero config changes needed for scaling events

Upstream Health Management

Health management determines which upstream instances can receive traffic. The gateway must detect unhealthy instances quickly and stop sending them requests — then detect recovery and resume traffic.

Type	How It Works	Pros	Cons
Active health check	Gateway periodically pings /health endpoint	Detects failures before clients hit them	Extra network traffic, false positives possible
Passive health check	Monitor response codes from real traffic	No extra traffic, based on real behavior	Requires traffic to detect failure
Outlier detection	Statistical analysis of error rates per instance	Catches degraded (not dead) instances	Needs enough traffic for statistical significance

health-check-config.yamlyaml

# Combined active + passive health checking
upstreams:
  - name: payment-service
    healthchecks:
      active:
        type: http
        http_path: /health
        healthy:
          interval: 5          # Check every 5 seconds
          successes: 2         # Mark healthy after 2 consecutive successes
          http_statuses: [200]
        unhealthy:
          interval: 2          # Check more frequently when unhealthy
          http_failures: 3     # Mark unhealthy after 3 consecutive failures
          tcp_failures: 2
          timeouts: 3
          http_statuses: [500, 502, 503]
      passive:
        healthy:
          successes: 5         # Mark healthy after 5 successful real requests
          http_statuses: [200, 201, 204]
        unhealthy:
          http_failures: 5     # Mark unhealthy after 5 failed real requests
          tcp_failures: 2
          timeouts: 3
          http_statuses: [500, 502, 503]

Graceful Shutdown Integration

When a service instance is shutting down (deployment, scaling down), it should: (1) Deregister from service registry. (2) Start failing health checks (return 503). (3) Finish in-flight requests. (4) Shut down. The gateway detects the health check failure and stops sending new requests, while existing requests complete normally. This is zero-downtime deployment.

Interview Questions

Q:How would you design cache invalidation for a product catalog API?

A: Layered approach: (1) Set Cache-Control: s-maxage=300, stale-while-revalidate=3600 — gateway caches for 5 min, serves stale for up to 1 hour while revalidating. (2) When a product is updated, publish an event to a message queue. (3) A cache invalidation consumer listens for product-change events and calls the gateway's purge API with the product's surrogate key. (4) For bulk updates (price changes), use tag-based purge. This gives you near-real-time freshness for changes while serving most reads from cache.

Q:What's the difference between active and passive health checks?

A: Active: the gateway periodically sends synthetic requests to /health endpoints — detects failures proactively before real traffic is affected. Passive: the gateway monitors responses from real traffic — detects failures reactively after some requests have already failed. Use both: active for fast detection (catches a dead service in seconds), passive for catching intermittent issues that health endpoints don't reveal (memory leaks causing slow responses).

Q:How does service discovery work in Kubernetes with an API Gateway?

A: The gateway watches the Kubernetes Endpoints API (or uses xDS protocol). When a new pod becomes Ready, Kubernetes updates the Endpoints object. The gateway detects this change and adds the pod to its upstream pool. When a pod is terminated, it's removed from Endpoints and the gateway stops routing to it. This is automatic — no manual config needed. The gateway can also use Kubernetes Services (ClusterIP) and let kube-proxy handle load balancing, but direct endpoint watching gives the gateway more control over load balancing algorithms.

Q:When should you cache at the gateway vs at the CDN vs at the service?

A: CDN: static assets, public content served globally (images, JS, CSS, public API docs). Gateway: API responses that are shared across users or change infrequently (product catalog, config). Service: user-specific data that requires business logic to determine freshness (user profile, order history — cached in Redis/Memcached). The layers complement each other: CDN handles edge caching, gateway handles API-level caching, service handles domain-specific caching.

Q:How do you handle cache key design for authenticated APIs?

A: For public data (same response regardless of user): key = method + path + sorted_query_params. For user-scoped data: include user_id or tenant_id in the key (from the Vary header or explicit config). For role-based responses (admin sees more fields): include role in the key. Critical rule: never serve one user's cached response to another user. Use the Vary header to signal which request attributes affect the response.

Common Mistakes

⚠️

Caching responses with Set-Cookie headers

The gateway caches a response that includes Set-Cookie, then serves that cookie to other users — leaking sessions.

✅Never cache responses with Set-Cookie headers. Configure the cache to skip responses containing session-related headers. If the upstream accidentally sends Set-Cookie on cacheable responses, strip it or bypass cache for that response.

⚠️

Not normalizing cache keys

Treating ?page=1&size=10 and ?size=10&page=1 as different cache entries — halving your hit rate.

✅Sort query parameters alphabetically, normalize case, and remove empty/default parameters before computing the cache key. This ensures semantically identical requests share a cache entry.

⚠️

Static upstream config in dynamic environments

Hardcoding upstream IPs in gateway config when running in Kubernetes where pod IPs change on every deployment.

✅Use dynamic service discovery: Kubernetes Endpoints API, Consul, or DNS-based resolution. The gateway should automatically detect new instances and remove terminated ones without config changes or restarts.

⚠️

No health checks on upstreams

The gateway load-balances across all configured upstreams without checking if they're actually healthy — sending traffic to dead instances.

✅Enable both active health checks (periodic /health pings) and passive health checks (monitor real response codes). Mark instances as unhealthy after consecutive failures and stop routing to them. Check more frequently when unhealthy to detect recovery quickly.