Request Lifecycle & Observability
What actually happens to a request as it flows through a gateway — the inbound pipeline, upstream communication, outbound pipeline, and the observability signals you collect.
Table of Contents
Inbound Processing Pipeline
Every request passes through a series of processing stages before reaching the upstream service. The order matters — each stage can short-circuit the pipeline (reject the request) without invoking later stages.
| Stage | Action | Can Reject? |
|---|---|---|
| 1. TLS Termination | Decrypt HTTPS, establish connection | Yes — invalid cert, TLS version |
| 2. Parse Request | Parse HTTP method, path, headers, body | Yes — malformed request (400) |
| 3. Request ID | Generate or extract X-Request-ID for correlation | No |
| 4. IP Filtering | Check allowlists/blocklists | Yes — blocked IP (403) |
| 5. Rate Limiting | Check rate limit counters | Yes — exceeded limit (429) |
| 6. Authentication | Validate token/key, extract identity | Yes — invalid credentials (401) |
| 7. Authorization | Check role/scope for endpoint | Yes — insufficient permissions (403) |
| 8. Request Validation | Validate body against schema | Yes — invalid payload (422) |
| 9. Transformation | Modify headers, rewrite URL, transform body | No |
| 10. Route Match | Determine upstream service and instance | Yes — no matching route (404) |
The Assembly Line
The inbound pipeline is like a factory assembly line with quality checkpoints. Each station inspects one aspect of the product (request). If it fails any checkpoint, it's rejected immediately — no point continuing down the line. The order is deliberate: cheap checks (IP filter, rate limit) come before expensive ones (auth, body validation) to reject bad requests as early as possible.
Order Matters for Performance
Rate limiting should come BEFORE authentication. Why? Auth requires cryptographic verification (expensive). If an attacker is flooding you with requests, you want to reject them at the rate limiter (cheap counter check) before spending CPU on JWT signature verification. The principle: cheapest rejections first.
Upstream Communication
After inbound processing, the gateway forwards the request to the upstream service. This involves connection management, load balancing, timeout handling, and retry logic.
| Concept | Description | Typical Value |
|---|---|---|
| Connection Pool | Reuse TCP connections to upstreams (avoid handshake per request) | 100-500 per upstream |
| Connect Timeout | Max time to establish TCP connection | 3-5 seconds |
| Read Timeout | Max time waiting for response after sending request | 30-60 seconds |
| Idle Timeout | Close pooled connections after inactivity | 60-90 seconds |
| Max Retries | Retry on 502/503/connection error | 1-2 retries |
| Retry Backoff | Wait between retries (exponential) | 100ms, 200ms, 400ms |
upstream order_service { least_conn; keepalive 64; # Connection pool size server order-svc-1:8080 max_fails=3 fail_timeout=30s; server order-svc-2:8080 max_fails=3 fail_timeout=30s; server order-svc-3:8080 max_fails=3 fail_timeout=30s; } server { location /api/orders { proxy_pass http://order_service; # Timeouts proxy_connect_timeout 5s; proxy_read_timeout 30s; proxy_send_timeout 10s; # Retry on failure (idempotent methods only) proxy_next_upstream error timeout http_502 http_503; proxy_next_upstream_tries 2; proxy_next_upstream_timeout 10s; # Forward headers proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header X-Request-ID $request_id; # Connection pooling proxy_http_version 1.1; proxy_set_header Connection ""; } }
Only Retry Idempotent Requests
Retrying a failed POST /orders could create duplicate orders. Only retry requests that are safe to repeat: GET, HEAD, OPTIONS, and explicitly idempotent operations (with Idempotency-Key header). For non-idempotent requests, return the error to the client and let them decide.
Outbound Processing Pipeline
After receiving the upstream response, the gateway processes it before returning to the client. This is the reverse pipeline — transformation, caching, and observability recording.
| Stage | Action |
|---|---|
| 1. Response received | Read status code, headers, body from upstream |
| 2. Response transformation | Remove internal headers, rename fields, filter body |
| 3. Cache storage | Store cacheable responses (GET 200 with Cache-Control) |
| 4. Compression | Gzip/Brotli compress response body if client accepts |
| 5. Metrics recording | Record latency, status code, upstream, route |
| 6. Access logging | Write structured log entry with full request/response metadata |
| 7. Return to client | Send final response with appropriate headers |
# Response transformation — clean up before returning to client plugins: - name: response-transformer config: remove: headers: - X-Powered-By # Don't reveal tech stack - Server # Don't reveal server software - X-Internal-Request-ID - X-Upstream-Latency add: headers: - "X-Request-ID:$(ctx.request_id)" - "X-Response-Time:$(ctx.latency_ms)ms" - "Strict-Transport-Security:max-age=31536000; includeSubDomains" - "X-Content-Type-Options:nosniff" - "X-Frame-Options:DENY"
Security Headers on Every Response
The gateway is the perfect place to inject security headers consistently: HSTS, X-Content-Type-Options, X-Frame-Options, Content-Security-Policy. Adding them at the gateway means every service gets them automatically — no service needs to remember to set them.
Error Handling
The gateway must handle errors gracefully — both its own errors and upstream failures. Clients should receive consistent, informative error responses regardless of which service failed or how.
| Status | Meaning | Gateway Action |
|---|---|---|
| 400 | Bad Request — malformed input | Return validation errors in consistent format |
| 401 | Unauthorized — auth failed | Return after auth plugin rejects token/key |
| 403 | Forbidden — insufficient permissions | Return after authorization check fails |
| 404 | Not Found — no matching route | Return when no route matches the path |
| 429 | Too Many Requests — rate limited | Return with Retry-After header |
| 502 | Bad Gateway — upstream returned invalid response | Log upstream error, return generic message |
| 503 | Service Unavailable — upstream down or circuit open | Return with Retry-After, serve cached if possible |
| 504 | Gateway Timeout — upstream didn't respond in time | Log timeout, return with context |
{ "error": { "code": "RATE_LIMIT_EXCEEDED", "message": "You have exceeded the rate limit of 100 requests per minute.", "details": { "limit": 100, "window": "1m", "retry_after": 23 }, "request_id": "req_abc123xyz", "documentation_url": "https://docs.example.com/errors/rate-limiting" } } // All gateway errors follow this structure: // - code: machine-readable error code (for client logic) // - message: human-readable description // - details: additional context (optional) // - request_id: for support/debugging correlation // - documentation_url: link to error documentation
Never Expose Internal Details
Gateway error responses must never expose internal service names, stack traces, or infrastructure details. A 502 should say "Service temporarily unavailable" — not "Connection refused to order-service-v2.internal:8080". Internal details help attackers map your infrastructure.
Logging
The gateway sees every request — making it the single best place for access logging. Structured JSON logs with consistent fields enable powerful querying and alerting.
{ "timestamp": "2024-03-01T12:00:00.123Z", "request_id": "req_abc123", "method": "POST", "path": "/api/v1/orders", "status": 201, "latency_ms": 145, "upstream_latency_ms": 132, "client_ip": "203.0.113.42", "user_agent": "MyApp/2.1.0", "consumer_id": "partner-acme", "user_id": "usr_xyz789", "route": "orders-create", "upstream": "order-service:8080", "request_size": 1024, "response_size": 256, "rate_limit_remaining": 87, "tls_version": "TLSv1.3" }
What NOT to Log
- ❌Authorization headers — contains tokens/keys (log consumer_id instead)
- ❌Request/response bodies — PII risk, storage cost (log size only)
- ❌Passwords or secrets — even in error messages
- ❌Full query parameters — may contain sensitive data (log path only)
- ❌Internal IP addresses — security risk if logs are exposed
Log Sampling at Scale
At 100K+ requests/second, logging every request generates terabytes daily. Use sampling: log 100% of errors and slow requests (p99), 10% of normal requests, 1% of health checks. This gives you full visibility into problems while controlling storage costs. Always log 100% of 4xx and 5xx responses.
Metrics
Metrics are aggregated numerical measurements — counters, gauges, and histograms. Unlike logs (per-request), metrics give you system-wide health at a glance. The gateway should expose Prometheus-compatible metrics for dashboarding and alerting.
| Metric | Type | What It Tells You |
|---|---|---|
| request_total | Counter | Total requests (by route, method, status) |
| request_duration_seconds | Histogram | Latency distribution (p50, p95, p99) |
| active_connections | Gauge | Current open connections |
| upstream_request_total | Counter | Requests sent to each upstream |
| upstream_response_time | Histogram | Backend latency (separate from gateway overhead) |
| rate_limit_hits_total | Counter | How often rate limits are triggered |
| circuit_breaker_state | Gauge | Current state per upstream (0=closed, 1=open) |
# Gateway exposes /metrics endpoint for Prometheus scraping # RED metrics (Rate, Errors, Duration) — the golden signals gateway_requests_total{route="orders-create",method="POST",status="201"} 15234 gateway_requests_total{route="orders-create",method="POST",status="500"} 12 gateway_request_duration_seconds_bucket{route="orders-create",le="0.1"} 14500 gateway_request_duration_seconds_bucket{route="orders-create",le="0.5"} 15100 gateway_request_duration_seconds_bucket{route="orders-create",le="1.0"} 15230 # Upstream health gateway_upstream_healthy{upstream="order-service"} 3 gateway_upstream_unhealthy{upstream="order-service"} 0 # Rate limiting gateway_rate_limit_exceeded_total{consumer="partner-acme"} 42 # Connection pool gateway_upstream_connections_active{upstream="order-service"} 67 gateway_upstream_connections_idle{upstream="order-service"} 33
The Four Golden Signals
Monitor these four signals for every route: (1) Latency — how long requests take (p50, p95, p99). (2) Traffic — requests per second. (3) Errors — percentage of 5xx responses. (4) Saturation — how full your connection pools and rate limit buckets are. Alert when any signal deviates from baseline.
Distributed Tracing
The gateway is the trace root — it generates the initial trace ID and span for every incoming request. As the request flows through downstream services, each service adds its own span. The result is a complete picture of the request's journey through your system.
| Concept | Description |
|---|---|
| Trace | The complete journey of a request across all services |
| Span | A single operation within a trace (gateway processing, DB query, etc.) |
| Trace ID | Unique identifier for the entire trace (propagated to all services) |
| Span ID | Unique identifier for a single span within the trace |
| Parent Span ID | Links child spans to their parent (creates the tree) |
| W3C Trace Context | Standard headers: traceparent, tracestate |
# W3C Trace Context — standard propagation headers # Gateway generates trace root: traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01 # version-trace_id(32 hex)-parent_span_id(16 hex)-flags # Gateway forwards to upstream with its span as parent: traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-d7ad39c1a2e5b8f2-01 # Each service: # 1. Extracts trace_id from incoming traceparent # 2. Creates new span_id for its own work # 3. Sets parent_span_id to the incoming span_id # 4. Forwards updated traceparent to next service # Optional vendor-specific context: tracestate: kong=s:1234,dd=s:5678
The Package Tracking Number
Distributed tracing is like a package tracking number. When you ship a package (request), it gets a tracking ID at the first facility (gateway). As it moves through sorting centers (services), each facility scans it and adds a timestamp. You can see the complete journey: where it went, how long each stop took, and where it got stuck. The trace ID is your tracking number.
Sampling for Production
Tracing every request in production is expensive (storage + network overhead). Use head-based sampling (decide at the gateway: trace 1% of requests) or tail-based sampling (collect all spans, keep only interesting traces — errors, slow requests). Always trace 100% of error responses.
Interview Questions
Q:Walk through what happens when a request hits an API Gateway.
A: 1. TLS termination (decrypt). 2. Parse HTTP request. 3. Generate/extract request ID. 4. IP filtering check. 5. Rate limit check. 6. Authentication (validate token). 7. Authorization (check role/scope). 8. Request validation (schema check). 9. Request transformation (headers, URL rewrite). 10. Route matching. 11. Load balance to upstream instance. 12. Forward request with identity headers. 13. Receive upstream response. 14. Response transformation. 15. Cache if applicable. 16. Record metrics + log. 17. Return to client. Each stage can short-circuit with an error response.
Q:Why should rate limiting come before authentication in the pipeline?
A: Authentication is computationally expensive (JWT signature verification, token introspection). Rate limiting is a cheap counter check. If an attacker floods you with invalid tokens, you want to reject at the rate limiter (O(1) Redis lookup) before spending CPU on cryptographic verification. This prevents auth-based DDoS where the attack vector is expensive token validation.
Q:How do you handle a 504 Gateway Timeout in production?
A: Immediate: return a clear error to the client with Retry-After header. Investigation: check upstream health (is the service overloaded?), check timeout configuration (is 30s too short for this endpoint?), check connection pool exhaustion. Prevention: set appropriate timeouts per route (search: 5s, report generation: 60s), implement circuit breakers, add request queuing for slow endpoints, and ensure health checks detect slow services.
Q:What's the difference between gateway latency and upstream latency, and why does it matter?
A: Gateway latency = total time from request received to response sent. Upstream latency = time spent waiting for the backend service. The difference (gateway overhead) is the time spent on TLS, auth, rate limiting, transformation. If gateway_latency >> upstream_latency, your gateway is the bottleneck (too many plugins, heavy transformation). If upstream_latency is high, the backend service needs optimization. Track both separately.
Q:How does the gateway enable distributed tracing across microservices?
A: The gateway is the trace root: (1) Generate a unique trace ID for each incoming request. (2) Create the first span (gateway processing). (3) Propagate trace context to upstream via W3C traceparent header. (4) Each downstream service extracts the trace ID, creates its own spans, and propagates further. (5) All spans are collected by a tracing backend (Jaeger, Zipkin). The gateway ensures every request has a trace ID — without it, you can't correlate logs across services.
Common Mistakes
Logging request/response bodies
Logging full request and response bodies for debugging — exposing PII, tokens, and sensitive data in log storage.
✅Log metadata only: method, path, status, latency, consumer_id, request_size. Never log bodies, Authorization headers, or cookies. For debugging specific issues, enable body logging temporarily for a single consumer with explicit retention policies.
Same timeout for all routes
Setting a global 60-second timeout for all upstream calls — fast endpoints wait too long on failure, slow endpoints get killed prematurely.
✅Set per-route timeouts based on expected latency: health checks (2s), CRUD operations (10s), search (5s), report generation (120s). A fast endpoint with a 60s timeout means clients wait a full minute before learning the service is down.
Retrying non-idempotent requests
Configuring the gateway to retry all failed requests including POST, PUT, and DELETE — causing duplicate orders, double charges, or data corruption.
✅Only retry idempotent methods (GET, HEAD, OPTIONS) by default. For POST/PUT, only retry if the client provides an Idempotency-Key header and the upstream supports idempotency. Never retry on 4xx errors — those won't succeed on retry.
No request ID correlation
Not generating or propagating a unique request ID — making it impossible to trace a client's issue across gateway logs, service logs, and database queries.
✅Generate a UUID request ID at the gateway (or accept X-Request-ID from the client). Propagate it to all upstream services. Include it in every log entry, error response, and trace span. When a client reports an issue, the request ID is your search key across all systems.