Caching & Service Discovery
The gateway as a caching layer and how it discovers upstream services — static config, Consul, Kubernetes, and DNS-based discovery.
Table of Contents
What to Cache at the Gateway
Gateway caching intercepts responses and serves them directly on subsequent identical requests — without hitting the upstream service. This reduces backend load, improves latency, and provides resilience (serve stale cache when backends are down).
Good Candidates for Gateway Caching
- ✅GET requests returning reference data (product catalog, configuration, feature flags)
- ✅Public API responses that are identical for all users (pricing pages, documentation)
- ✅Expensive aggregations that change infrequently (dashboards, reports)
- ✅Static-ish data with known TTL (exchange rates updated hourly, weather data)
- ✅Responses with explicit Cache-Control headers from upstream
What NOT to Cache
- ❌User-specific data without proper Vary handling (my orders, my profile)
- ❌Responses to POST/PUT/DELETE (mutations must reach the backend)
- ❌Real-time data (stock prices, live scores) unless stale-while-revalidate is acceptable
- ❌Responses with Set-Cookie headers (session-specific)
- ❌Large binary responses (better served by CDN)
The Restaurant Specials Board
Gateway caching is like a restaurant's specials board. Instead of the waiter (gateway) asking the kitchen (backend) about today's specials for every customer, they write it on a board once and point customers to it. The board is updated when specials change (invalidation). But you'd never put 'your personal dietary requirements' on a shared board — that's per-user data that shouldn't be cached generically.
Cache Key Design
The cache key determines when two requests are considered "the same" and can share a cached response. A poorly designed key either serves wrong data (too broad) or never hits cache (too specific).
| Component | Include? | Why |
|---|---|---|
| HTTP Method | Yes | GET /users and DELETE /users are very different |
| URL Path | Yes | /users/1 and /users/2 are different resources |
| Query Parameters | Yes (sorted) | ?page=1&size=10 and ?size=10&page=1 should match |
| Vary Headers | Yes | Accept-Language: en vs fr return different content |
| Authorization | Depends | Include for user-scoped caching, exclude for public data |
| Request Body | No | Cache is for GET requests — no body |
# Kong proxy-cache plugin configuration plugins: - name: proxy-cache config: strategy: redis redis: host: redis-cache port: 6379 content_type: - application/json request_method: - GET - HEAD response_code: - 200 - 301 cache_ttl: 300 # 5 minutes default vary_headers: - Accept - Accept-Language vary_query_params: - page - size - sort # Cache key formula: # SHA256(method + path + sorted_query_params + vary_header_values) # Example: SHA256("GET:/api/products:page=1&size=20:accept=application/json")
Query Parameter Normalization
Sort query parameters alphabetically before hashing. Without normalization,?a=1&b=2 and ?b=2&a=1 produce different cache keys for identical requests. Also normalize case and remove empty parameters. This simple step can dramatically improve cache hit rates.
Cache Control Headers
HTTP cache control headers tell the gateway (and any intermediate caches) how to handle response caching. The upstream service sets these headers to communicate caching intent.
| Header/Directive | Meaning | Example |
|---|---|---|
| max-age=N | Cache for N seconds from response time | Cache-Control: max-age=3600 |
| s-maxage=N | Max age for shared caches (gateway) — overrides max-age | Cache-Control: s-maxage=600 |
| no-store | Never cache this response anywhere | Cache-Control: no-store |
| no-cache | Cache but revalidate before serving (misleading name) | Cache-Control: no-cache |
| private | Only browser can cache, not gateway/CDN | Cache-Control: private |
| public | Any cache (gateway, CDN) can store this | Cache-Control: public, max-age=3600 |
| stale-while-revalidate=N | Serve stale for N seconds while fetching fresh | Cache-Control: max-age=60, stale-while-revalidate=300 |
Conditional Requests (ETag / Last-Modified)
# First request — gateway caches response with ETag GET /api/products/123 Response: 200 OK ETag: "abc123hash" Cache-Control: max-age=60 Last-Modified: Fri, 01 Mar 2024 12:00:00 GMT Body: {"id": 123, "name": "Widget", "price": 9.99} # After max-age expires — gateway revalidates GET /api/products/123 If-None-Match: "abc123hash" If-Modified-Since: Fri, 01 Mar 2024 12:00:00 GMT # If unchanged — upstream returns 304 (no body transfer) Response: 304 Not Modified ETag: "abc123hash" # Gateway serves cached body — saves bandwidth and backend processing
stale-while-revalidate for Resilience
stale-while-revalidate is powerful for gateway caching. When a cached response expires, the gateway serves the stale version immediately (fast response) while fetching a fresh copy in the background. If the backend is down, the stale response is still served — providing resilience. Combine with stale-if-error for explicit fallback behavior.
Cache Invalidation
Cache invalidation is famously one of the two hard problems in computer science. The gateway needs strategies to remove stale data when the underlying resource changes.
| Strategy | How It Works | Trade-off |
|---|---|---|
| TTL Expiry | Cache expires after fixed time | Simple but serves stale data until TTL |
| Explicit Purge | API call to delete specific cache entry | Precise but requires purge infrastructure |
| Tag-based Purge | Tag entries, purge all entries with a tag | Efficient for related resources |
| Surrogate Keys | Backend sends Surrogate-Key header, purge by key | Powerful but complex setup |
| Event-driven | Listen to change events, invalidate affected entries | Real-time but requires event infrastructure |
# Explicit purge — delete specific cached response curl -X PURGE https://gateway.internal/cache/api/products/123 # Tag-based purge — invalidate all product-related cache entries curl -X POST https://gateway.internal/cache/purge -H "Content-Type: application/json" -d '{"tags": ["products", "catalog"]}' # Surrogate key approach — backend includes keys in response # Response from upstream: # Surrogate-Key: product-123 category-electronics all-products # # When product 123 changes: curl -X POST https://gateway.internal/cache/purge -d '{"surrogate_key": "product-123"}' # Purges all cached responses tagged with "product-123"
The Pragmatic Approach
For most APIs, short TTLs (30–300 seconds) with stale-while-revalidate are sufficient. You accept slightly stale data in exchange for simplicity. Only invest in explicit invalidation when freshness is critical (inventory counts, pricing) or when cache hit rates are low due to short TTLs.
Static vs Dynamic Service Discovery
The gateway needs to know where upstream services live — their IP addresses and ports. Static configuration works for simple deployments but breaks down when services scale dynamically or move across hosts.
| Approach | How It Works | Best For |
|---|---|---|
| Static config | Hardcode upstream IPs in gateway config | Small, stable deployments |
| DNS-based | Resolve service hostname, DNS returns current IPs | Simple dynamic discovery |
| Registry-based | Query service registry (Consul, Eureka) for endpoints | Microservices with frequent scaling |
| Kubernetes-native | Watch Kubernetes Endpoints/Services API | Kubernetes deployments |
| File-based watch | Read upstream list from file, reload on change | GitOps with config management |
# Static configuration — hardcoded upstreams upstreams: - name: order-service targets: - target: 10.0.1.10:8080 weight: 100 - target: 10.0.1.11:8080 weight: 100 - target: 10.0.1.12:8080 weight: 100 # Problem: when order-service scales to 5 instances, # you must manually update this config and reload --- # Dynamic configuration — DNS-based upstreams: - name: order-service host: order-service.internal # DNS resolves to current IPs dns_resolver: 10.0.0.2 dns_ttl: 30 # Re-resolve every 30 seconds # Automatically picks up new instances when DNS updates
Static Config is a Scaling Bottleneck
Static upstream configuration means every scale event (new instance, removed instance, IP change) requires a gateway config update and reload. In Kubernetes where pods are ephemeral and IPs change constantly, static config is unworkable. Dynamic discovery is not optional in container orchestration environments.
Dynamic Discovery
Dynamic service discovery allows the gateway to automatically detect new service instances, removed instances, and health changes without configuration updates.
| Registry | Model | Gateway Integration |
|---|---|---|
| Consul | Agent-based, health-checked, KV store | DNS interface or HTTP API polling |
| Kubernetes | Endpoints API, label selectors | Watch API for real-time updates |
| Eureka | Client-side registration, heartbeat | HTTP API polling |
| etcd | Distributed KV store, watch support | Watch keys for changes |
| DNS SRV | Standard DNS with port + priority | SRV record resolution |
# Kong with Consul service discovery upstreams: - name: order-service algorithm: round-robin healthchecks: active: http_path: /health healthy: interval: 5 successes: 2 unhealthy: interval: 5 http_failures: 3 # Kong queries Consul for healthy instances: # GET /v1/health/service/order-service?passing=true # # Returns: # [ # {"Service": {"Address": "10.0.1.10", "Port": 8080}}, # {"Service": {"Address": "10.0.1.11", "Port": 8080}} # ] # # Gateway updates upstream targets automatically
# Envoy with Kubernetes Endpoint Discovery Service (EDS) clusters: - name: order-service type: EDS eds_cluster_config: eds_config: api_config_source: api_type: GRPC grpc_services: - envoy_grpc: cluster_name: xds-cluster # Envoy watches Kubernetes Endpoints via xDS API # New pods automatically added, terminated pods removed # Zero config changes needed for scaling events
Upstream Health Management
Health management determines which upstream instances can receive traffic. The gateway must detect unhealthy instances quickly and stop sending them requests — then detect recovery and resume traffic.
| Type | How It Works | Pros | Cons |
|---|---|---|---|
| Active health check | Gateway periodically pings /health endpoint | Detects failures before clients hit them | Extra network traffic, false positives possible |
| Passive health check | Monitor response codes from real traffic | No extra traffic, based on real behavior | Requires traffic to detect failure |
| Outlier detection | Statistical analysis of error rates per instance | Catches degraded (not dead) instances | Needs enough traffic for statistical significance |
# Combined active + passive health checking upstreams: - name: payment-service healthchecks: active: type: http http_path: /health healthy: interval: 5 # Check every 5 seconds successes: 2 # Mark healthy after 2 consecutive successes http_statuses: [200] unhealthy: interval: 2 # Check more frequently when unhealthy http_failures: 3 # Mark unhealthy after 3 consecutive failures tcp_failures: 2 timeouts: 3 http_statuses: [500, 502, 503] passive: healthy: successes: 5 # Mark healthy after 5 successful real requests http_statuses: [200, 201, 204] unhealthy: http_failures: 5 # Mark unhealthy after 5 failed real requests tcp_failures: 2 timeouts: 3 http_statuses: [500, 502, 503]
Graceful Shutdown Integration
When a service instance is shutting down (deployment, scaling down), it should: (1) Deregister from service registry. (2) Start failing health checks (return 503). (3) Finish in-flight requests. (4) Shut down. The gateway detects the health check failure and stops sending new requests, while existing requests complete normally. This is zero-downtime deployment.
Interview Questions
Q:How would you design cache invalidation for a product catalog API?
A: Layered approach: (1) Set Cache-Control: s-maxage=300, stale-while-revalidate=3600 — gateway caches for 5 min, serves stale for up to 1 hour while revalidating. (2) When a product is updated, publish an event to a message queue. (3) A cache invalidation consumer listens for product-change events and calls the gateway's purge API with the product's surrogate key. (4) For bulk updates (price changes), use tag-based purge. This gives you near-real-time freshness for changes while serving most reads from cache.
Q:What's the difference between active and passive health checks?
A: Active: the gateway periodically sends synthetic requests to /health endpoints — detects failures proactively before real traffic is affected. Passive: the gateway monitors responses from real traffic — detects failures reactively after some requests have already failed. Use both: active for fast detection (catches a dead service in seconds), passive for catching intermittent issues that health endpoints don't reveal (memory leaks causing slow responses).
Q:How does service discovery work in Kubernetes with an API Gateway?
A: The gateway watches the Kubernetes Endpoints API (or uses xDS protocol). When a new pod becomes Ready, Kubernetes updates the Endpoints object. The gateway detects this change and adds the pod to its upstream pool. When a pod is terminated, it's removed from Endpoints and the gateway stops routing to it. This is automatic — no manual config needed. The gateway can also use Kubernetes Services (ClusterIP) and let kube-proxy handle load balancing, but direct endpoint watching gives the gateway more control over load balancing algorithms.
Q:When should you cache at the gateway vs at the CDN vs at the service?
A: CDN: static assets, public content served globally (images, JS, CSS, public API docs). Gateway: API responses that are shared across users or change infrequently (product catalog, config). Service: user-specific data that requires business logic to determine freshness (user profile, order history — cached in Redis/Memcached). The layers complement each other: CDN handles edge caching, gateway handles API-level caching, service handles domain-specific caching.
Q:How do you handle cache key design for authenticated APIs?
A: For public data (same response regardless of user): key = method + path + sorted_query_params. For user-scoped data: include user_id or tenant_id in the key (from the Vary header or explicit config). For role-based responses (admin sees more fields): include role in the key. Critical rule: never serve one user's cached response to another user. Use the Vary header to signal which request attributes affect the response.
Common Mistakes
Caching responses with Set-Cookie headers
The gateway caches a response that includes Set-Cookie, then serves that cookie to other users — leaking sessions.
✅Never cache responses with Set-Cookie headers. Configure the cache to skip responses containing session-related headers. If the upstream accidentally sends Set-Cookie on cacheable responses, strip it or bypass cache for that response.
Not normalizing cache keys
Treating ?page=1&size=10 and ?size=10&page=1 as different cache entries — halving your hit rate.
✅Sort query parameters alphabetically, normalize case, and remove empty/default parameters before computing the cache key. This ensures semantically identical requests share a cache entry.
Static upstream config in dynamic environments
Hardcoding upstream IPs in gateway config when running in Kubernetes where pod IPs change on every deployment.
✅Use dynamic service discovery: Kubernetes Endpoints API, Consul, or DNS-based resolution. The gateway should automatically detect new instances and remove terminated ones without config changes or restarts.
No health checks on upstreams
The gateway load-balances across all configured upstreams without checking if they're actually healthy — sending traffic to dead instances.
✅Enable both active health checks (periodic /health pings) and passive health checks (monitor real response codes). Mark instances as unhealthy after consecutive failures and stop routing to them. Check more frequently when unhealthy to detect recovery quickly.