Request PipelineUpstreamLoggingMetricsDistributed TracingHealth ChecksError Handling

Request Lifecycle & Observability

What actually happens to a request as it flows through a gateway — the inbound pipeline, upstream communication, outbound pipeline, and the observability signals you collect.

40 min read9 sections
01

Inbound Processing Pipeline

Every request passes through a series of processing stages before reaching the upstream service. The order matters — each stage can short-circuit the pipeline (reject the request) without invoking later stages.

StageActionCan Reject?
1. TLS TerminationDecrypt HTTPS, establish connectionYes — invalid cert, TLS version
2. Parse RequestParse HTTP method, path, headers, bodyYes — malformed request (400)
3. Request IDGenerate or extract X-Request-ID for correlationNo
4. IP FilteringCheck allowlists/blocklistsYes — blocked IP (403)
5. Rate LimitingCheck rate limit countersYes — exceeded limit (429)
6. AuthenticationValidate token/key, extract identityYes — invalid credentials (401)
7. AuthorizationCheck role/scope for endpointYes — insufficient permissions (403)
8. Request ValidationValidate body against schemaYes — invalid payload (422)
9. TransformationModify headers, rewrite URL, transform bodyNo
10. Route MatchDetermine upstream service and instanceYes — no matching route (404)
🏭

The Assembly Line

The inbound pipeline is like a factory assembly line with quality checkpoints. Each station inspects one aspect of the product (request). If it fails any checkpoint, it's rejected immediately — no point continuing down the line. The order is deliberate: cheap checks (IP filter, rate limit) come before expensive ones (auth, body validation) to reject bad requests as early as possible.

Order Matters for Performance

Rate limiting should come BEFORE authentication. Why? Auth requires cryptographic verification (expensive). If an attacker is flooding you with requests, you want to reject them at the rate limiter (cheap counter check) before spending CPU on JWT signature verification. The principle: cheapest rejections first.

02

Upstream Communication

After inbound processing, the gateway forwards the request to the upstream service. This involves connection management, load balancing, timeout handling, and retry logic.

ConceptDescriptionTypical Value
Connection PoolReuse TCP connections to upstreams (avoid handshake per request)100-500 per upstream
Connect TimeoutMax time to establish TCP connection3-5 seconds
Read TimeoutMax time waiting for response after sending request30-60 seconds
Idle TimeoutClose pooled connections after inactivity60-90 seconds
Max RetriesRetry on 502/503/connection error1-2 retries
Retry BackoffWait between retries (exponential)100ms, 200ms, 400ms
nginx-upstream-config.confnginx
upstream order_service {
    least_conn;
    keepalive 64;  # Connection pool size

    server order-svc-1:8080 max_fails=3 fail_timeout=30s;
    server order-svc-2:8080 max_fails=3 fail_timeout=30s;
    server order-svc-3:8080 max_fails=3 fail_timeout=30s;
}

server {
    location /api/orders {
        proxy_pass http://order_service;

        # Timeouts
        proxy_connect_timeout 5s;
        proxy_read_timeout 30s;
        proxy_send_timeout 10s;

        # Retry on failure (idempotent methods only)
        proxy_next_upstream error timeout http_502 http_503;
        proxy_next_upstream_tries 2;
        proxy_next_upstream_timeout 10s;

        # Forward headers
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Request-ID $request_id;

        # Connection pooling
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

Only Retry Idempotent Requests

Retrying a failed POST /orders could create duplicate orders. Only retry requests that are safe to repeat: GET, HEAD, OPTIONS, and explicitly idempotent operations (with Idempotency-Key header). For non-idempotent requests, return the error to the client and let them decide.

03

Outbound Processing Pipeline

After receiving the upstream response, the gateway processes it before returning to the client. This is the reverse pipeline — transformation, caching, and observability recording.

StageAction
1. Response receivedRead status code, headers, body from upstream
2. Response transformationRemove internal headers, rename fields, filter body
3. Cache storageStore cacheable responses (GET 200 with Cache-Control)
4. CompressionGzip/Brotli compress response body if client accepts
5. Metrics recordingRecord latency, status code, upstream, route
6. Access loggingWrite structured log entry with full request/response metadata
7. Return to clientSend final response with appropriate headers
response-transformation.yamlyaml
# Response transformationclean up before returning to client
plugins:
  - name: response-transformer
    config:
      remove:
        headers:
          - X-Powered-By        # Don't reveal tech stack
          - Server               # Don't reveal server software
          - X-Internal-Request-ID
          - X-Upstream-Latency
      add:
        headers:
          - "X-Request-ID:$(ctx.request_id)"
          - "X-Response-Time:$(ctx.latency_ms)ms"
          - "Strict-Transport-Security:max-age=31536000; includeSubDomains"
          - "X-Content-Type-Options:nosniff"
          - "X-Frame-Options:DENY"

Security Headers on Every Response

The gateway is the perfect place to inject security headers consistently: HSTS, X-Content-Type-Options, X-Frame-Options, Content-Security-Policy. Adding them at the gateway means every service gets them automatically — no service needs to remember to set them.

04

Error Handling

The gateway must handle errors gracefully — both its own errors and upstream failures. Clients should receive consistent, informative error responses regardless of which service failed or how.

StatusMeaningGateway Action
400Bad Request — malformed inputReturn validation errors in consistent format
401Unauthorized — auth failedReturn after auth plugin rejects token/key
403Forbidden — insufficient permissionsReturn after authorization check fails
404Not Found — no matching routeReturn when no route matches the path
429Too Many Requests — rate limitedReturn with Retry-After header
502Bad Gateway — upstream returned invalid responseLog upstream error, return generic message
503Service Unavailable — upstream down or circuit openReturn with Retry-After, serve cached if possible
504Gateway Timeout — upstream didn't respond in timeLog timeout, return with context
consistent-error-response.jsonjson
{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "You have exceeded the rate limit of 100 requests per minute.",
    "details": {
      "limit": 100,
      "window": "1m",
      "retry_after": 23
    },
    "request_id": "req_abc123xyz",
    "documentation_url": "https://docs.example.com/errors/rate-limiting"
  }
}

// All gateway errors follow this structure:
// - code: machine-readable error code (for client logic)
// - message: human-readable description
// - details: additional context (optional)
// - request_id: for support/debugging correlation
// - documentation_url: link to error documentation

Never Expose Internal Details

Gateway error responses must never expose internal service names, stack traces, or infrastructure details. A 502 should say "Service temporarily unavailable" — not "Connection refused to order-service-v2.internal:8080". Internal details help attackers map your infrastructure.

05

Logging

The gateway sees every request — making it the single best place for access logging. Structured JSON logs with consistent fields enable powerful querying and alerting.

structured-access-log.jsonjson
{
  "timestamp": "2024-03-01T12:00:00.123Z",
  "request_id": "req_abc123",
  "method": "POST",
  "path": "/api/v1/orders",
  "status": 201,
  "latency_ms": 145,
  "upstream_latency_ms": 132,
  "client_ip": "203.0.113.42",
  "user_agent": "MyApp/2.1.0",
  "consumer_id": "partner-acme",
  "user_id": "usr_xyz789",
  "route": "orders-create",
  "upstream": "order-service:8080",
  "request_size": 1024,
  "response_size": 256,
  "rate_limit_remaining": 87,
  "tls_version": "TLSv1.3"
}

What NOT to Log

  • Authorization headers — contains tokens/keys (log consumer_id instead)
  • Request/response bodies — PII risk, storage cost (log size only)
  • Passwords or secrets — even in error messages
  • Full query parameters — may contain sensitive data (log path only)
  • Internal IP addresses — security risk if logs are exposed

Log Sampling at Scale

At 100K+ requests/second, logging every request generates terabytes daily. Use sampling: log 100% of errors and slow requests (p99), 10% of normal requests, 1% of health checks. This gives you full visibility into problems while controlling storage costs. Always log 100% of 4xx and 5xx responses.

06

Metrics

Metrics are aggregated numerical measurements — counters, gauges, and histograms. Unlike logs (per-request), metrics give you system-wide health at a glance. The gateway should expose Prometheus-compatible metrics for dashboarding and alerting.

MetricTypeWhat It Tells You
request_totalCounterTotal requests (by route, method, status)
request_duration_secondsHistogramLatency distribution (p50, p95, p99)
active_connectionsGaugeCurrent open connections
upstream_request_totalCounterRequests sent to each upstream
upstream_response_timeHistogramBackend latency (separate from gateway overhead)
rate_limit_hits_totalCounterHow often rate limits are triggered
circuit_breaker_stateGaugeCurrent state per upstream (0=closed, 1=open)
prometheus-metrics-endpoint.txtbash
# Gateway exposes /metrics endpoint for Prometheus scraping

# RED metrics (Rate, Errors, Duration) — the golden signals
gateway_requests_total{route="orders-create",method="POST",status="201"} 15234
gateway_requests_total{route="orders-create",method="POST",status="500"} 12
gateway_request_duration_seconds_bucket{route="orders-create",le="0.1"} 14500
gateway_request_duration_seconds_bucket{route="orders-create",le="0.5"} 15100
gateway_request_duration_seconds_bucket{route="orders-create",le="1.0"} 15230

# Upstream health
gateway_upstream_healthy{upstream="order-service"} 3
gateway_upstream_unhealthy{upstream="order-service"} 0

# Rate limiting
gateway_rate_limit_exceeded_total{consumer="partner-acme"} 42

# Connection pool
gateway_upstream_connections_active{upstream="order-service"} 67
gateway_upstream_connections_idle{upstream="order-service"} 33

The Four Golden Signals

Monitor these four signals for every route: (1) Latency — how long requests take (p50, p95, p99). (2) Traffic — requests per second. (3) Errors — percentage of 5xx responses. (4) Saturation — how full your connection pools and rate limit buckets are. Alert when any signal deviates from baseline.

07

Distributed Tracing

The gateway is the trace root — it generates the initial trace ID and span for every incoming request. As the request flows through downstream services, each service adds its own span. The result is a complete picture of the request's journey through your system.

ConceptDescription
TraceThe complete journey of a request across all services
SpanA single operation within a trace (gateway processing, DB query, etc.)
Trace IDUnique identifier for the entire trace (propagated to all services)
Span IDUnique identifier for a single span within the trace
Parent Span IDLinks child spans to their parent (creates the tree)
W3C Trace ContextStandard headers: traceparent, tracestate
w3c-trace-context-headers.txtbash
# W3C Trace Contextstandard propagation headers

# Gateway generates trace root:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
#            version-trace_id(32 hex)-parent_span_id(16 hex)-flags

# Gateway forwards to upstream with its span as parent:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-d7ad39c1a2e5b8f2-01

# Each service:
# 1. Extracts trace_id from incoming traceparent
# 2. Creates new span_id for its own work
# 3. Sets parent_span_id to the incoming span_id
# 4. Forwards updated traceparent to next service

# Optional vendor-specific context:
tracestate: kong=s:1234,dd=s:5678
📦

The Package Tracking Number

Distributed tracing is like a package tracking number. When you ship a package (request), it gets a tracking ID at the first facility (gateway). As it moves through sorting centers (services), each facility scans it and adds a timestamp. You can see the complete journey: where it went, how long each stop took, and where it got stuck. The trace ID is your tracking number.

Sampling for Production

Tracing every request in production is expensive (storage + network overhead). Use head-based sampling (decide at the gateway: trace 1% of requests) or tail-based sampling (collect all spans, keep only interesting traces — errors, slow requests). Always trace 100% of error responses.

08

Interview Questions

Q:Walk through what happens when a request hits an API Gateway.

A: 1. TLS termination (decrypt). 2. Parse HTTP request. 3. Generate/extract request ID. 4. IP filtering check. 5. Rate limit check. 6. Authentication (validate token). 7. Authorization (check role/scope). 8. Request validation (schema check). 9. Request transformation (headers, URL rewrite). 10. Route matching. 11. Load balance to upstream instance. 12. Forward request with identity headers. 13. Receive upstream response. 14. Response transformation. 15. Cache if applicable. 16. Record metrics + log. 17. Return to client. Each stage can short-circuit with an error response.

Q:Why should rate limiting come before authentication in the pipeline?

A: Authentication is computationally expensive (JWT signature verification, token introspection). Rate limiting is a cheap counter check. If an attacker floods you with invalid tokens, you want to reject at the rate limiter (O(1) Redis lookup) before spending CPU on cryptographic verification. This prevents auth-based DDoS where the attack vector is expensive token validation.

Q:How do you handle a 504 Gateway Timeout in production?

A: Immediate: return a clear error to the client with Retry-After header. Investigation: check upstream health (is the service overloaded?), check timeout configuration (is 30s too short for this endpoint?), check connection pool exhaustion. Prevention: set appropriate timeouts per route (search: 5s, report generation: 60s), implement circuit breakers, add request queuing for slow endpoints, and ensure health checks detect slow services.

Q:What's the difference between gateway latency and upstream latency, and why does it matter?

A: Gateway latency = total time from request received to response sent. Upstream latency = time spent waiting for the backend service. The difference (gateway overhead) is the time spent on TLS, auth, rate limiting, transformation. If gateway_latency >> upstream_latency, your gateway is the bottleneck (too many plugins, heavy transformation). If upstream_latency is high, the backend service needs optimization. Track both separately.

Q:How does the gateway enable distributed tracing across microservices?

A: The gateway is the trace root: (1) Generate a unique trace ID for each incoming request. (2) Create the first span (gateway processing). (3) Propagate trace context to upstream via W3C traceparent header. (4) Each downstream service extracts the trace ID, creates its own spans, and propagates further. (5) All spans are collected by a tracing backend (Jaeger, Zipkin). The gateway ensures every request has a trace ID — without it, you can't correlate logs across services.

09

Common Mistakes

⚠️

Logging request/response bodies

Logging full request and response bodies for debugging — exposing PII, tokens, and sensitive data in log storage.

Log metadata only: method, path, status, latency, consumer_id, request_size. Never log bodies, Authorization headers, or cookies. For debugging specific issues, enable body logging temporarily for a single consumer with explicit retention policies.

⚠️

Same timeout for all routes

Setting a global 60-second timeout for all upstream calls — fast endpoints wait too long on failure, slow endpoints get killed prematurely.

Set per-route timeouts based on expected latency: health checks (2s), CRUD operations (10s), search (5s), report generation (120s). A fast endpoint with a 60s timeout means clients wait a full minute before learning the service is down.

⚠️

Retrying non-idempotent requests

Configuring the gateway to retry all failed requests including POST, PUT, and DELETE — causing duplicate orders, double charges, or data corruption.

Only retry idempotent methods (GET, HEAD, OPTIONS) by default. For POST/PUT, only retry if the client provides an Idempotency-Key header and the upstream supports idempotency. Never retry on 4xx errors — those won't succeed on retry.

⚠️

No request ID correlation

Not generating or propagating a unique request ID — making it impossible to trace a client's issue across gateway logs, service logs, and database queries.

Generate a UUID request ID at the gateway (or accept X-Request-ID from the client). Propagate it to all upstream services. Include it in every log entry, error response, and trace span. When a client reports an issue, the request ID is your search key across all systems.