Request PipelineUpstreamLoggingMetricsDistributed TracingHealth ChecksError Handling

Request Lifecycle & Observability

What actually happens to a request as it flows through a gateway — the inbound pipeline, upstream communication, outbound pipeline, and the observability signals you collect.

40 min read9 sections

Inbound Processing Pipeline

Every request passes through a series of processing stages before reaching the upstream service. The order matters — each stage can short-circuit the pipeline (reject the request) without invoking later stages.

Stage	Action	Can Reject?
1. TLS Termination	Decrypt HTTPS, establish connection	Yes — invalid cert, TLS version
2. Parse Request	Parse HTTP method, path, headers, body	Yes — malformed request (400)
3. Request ID	Generate or extract X-Request-ID for correlation	No
4. IP Filtering	Check allowlists/blocklists	Yes — blocked IP (403)
5. Rate Limiting	Check rate limit counters	Yes — exceeded limit (429)
6. Authentication	Validate token/key, extract identity	Yes — invalid credentials (401)
7. Authorization	Check role/scope for endpoint	Yes — insufficient permissions (403)
8. Request Validation	Validate body against schema	Yes — invalid payload (422)
9. Transformation	Modify headers, rewrite URL, transform body	No
10. Route Match	Determine upstream service and instance	Yes — no matching route (404)

🏭

The Assembly Line

The inbound pipeline is like a factory assembly line with quality checkpoints. Each station inspects one aspect of the product (request). If it fails any checkpoint, it's rejected immediately — no point continuing down the line. The order is deliberate: cheap checks (IP filter, rate limit) come before expensive ones (auth, body validation) to reject bad requests as early as possible.

Order Matters for Performance

Rate limiting should come BEFORE authentication. Why? Auth requires cryptographic verification (expensive). If an attacker is flooding you with requests, you want to reject them at the rate limiter (cheap counter check) before spending CPU on JWT signature verification. The principle: cheapest rejections first.

Upstream Communication

After inbound processing, the gateway forwards the request to the upstream service. This involves connection management, load balancing, timeout handling, and retry logic.

Concept	Description	Typical Value
Connection Pool	Reuse TCP connections to upstreams (avoid handshake per request)	100-500 per upstream
Connect Timeout	Max time to establish TCP connection	3-5 seconds
Read Timeout	Max time waiting for response after sending request	30-60 seconds
Idle Timeout	Close pooled connections after inactivity	60-90 seconds
Max Retries	Retry on 502/503/connection error	1-2 retries
Retry Backoff	Wait between retries (exponential)	100ms, 200ms, 400ms

nginx-upstream-config.confnginx

upstream order_service {
    least_conn;
    keepalive 64;  # Connection pool size

    server order-svc-1:8080 max_fails=3 fail_timeout=30s;
    server order-svc-2:8080 max_fails=3 fail_timeout=30s;
    server order-svc-3:8080 max_fails=3 fail_timeout=30s;
}

server {
    location /api/orders {
        proxy_pass http://order_service;

        # Timeouts
        proxy_connect_timeout 5s;
        proxy_read_timeout 30s;
        proxy_send_timeout 10s;

        # Retry on failure (idempotent methods only)
        proxy_next_upstream error timeout http_502 http_503;
        proxy_next_upstream_tries 2;
        proxy_next_upstream_timeout 10s;

        # Forward headers
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Request-ID $request_id;

        # Connection pooling
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

Only Retry Idempotent Requests

Retrying a failed POST /orders could create duplicate orders. Only retry requests that are safe to repeat: GET, HEAD, OPTIONS, and explicitly idempotent operations (with Idempotency-Key header). For non-idempotent requests, return the error to the client and let them decide.

Outbound Processing Pipeline

After receiving the upstream response, the gateway processes it before returning to the client. This is the reverse pipeline — transformation, caching, and observability recording.

Stage	Action
1. Response received	Read status code, headers, body from upstream
2. Response transformation	Remove internal headers, rename fields, filter body
3. Cache storage	Store cacheable responses (GET 200 with Cache-Control)
4. Compression	Gzip/Brotli compress response body if client accepts
5. Metrics recording	Record latency, status code, upstream, route
6. Access logging	Write structured log entry with full request/response metadata
7. Return to client	Send final response with appropriate headers

response-transformation.yamlyaml

# Response transformation — clean up before returning to client
plugins:
  - name: response-transformer
    config:
      remove:
        headers:
          - X-Powered-By        # Don't reveal tech stack
          - Server               # Don't reveal server software
          - X-Internal-Request-ID
          - X-Upstream-Latency
      add:
        headers:
          - "X-Request-ID:$(ctx.request_id)"
          - "X-Response-Time:$(ctx.latency_ms)ms"
          - "Strict-Transport-Security:max-age=31536000; includeSubDomains"
          - "X-Content-Type-Options:nosniff"
          - "X-Frame-Options:DENY"

Security Headers on Every Response

The gateway is the perfect place to inject security headers consistently: HSTS, X-Content-Type-Options, X-Frame-Options, Content-Security-Policy. Adding them at the gateway means every service gets them automatically — no service needs to remember to set them.

Error Handling

The gateway must handle errors gracefully — both its own errors and upstream failures. Clients should receive consistent, informative error responses regardless of which service failed or how.

Status	Meaning	Gateway Action
400	Bad Request — malformed input	Return validation errors in consistent format
401	Unauthorized — auth failed	Return after auth plugin rejects token/key
403	Forbidden — insufficient permissions	Return after authorization check fails
404	Not Found — no matching route	Return when no route matches the path
429	Too Many Requests — rate limited	Return with Retry-After header
502	Bad Gateway — upstream returned invalid response	Log upstream error, return generic message
503	Service Unavailable — upstream down or circuit open	Return with Retry-After, serve cached if possible
504	Gateway Timeout — upstream didn't respond in time	Log timeout, return with context

consistent-error-response.jsonjson

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "You have exceeded the rate limit of 100 requests per minute.",
    "details": {
      "limit": 100,
      "window": "1m",
      "retry_after": 23
    },
    "request_id": "req_abc123xyz",
    "documentation_url": "https://docs.example.com/errors/rate-limiting"
  }
}

// All gateway errors follow this structure:
// - code: machine-readable error code (for client logic)
// - message: human-readable description
// - details: additional context (optional)
// - request_id: for support/debugging correlation
// - documentation_url: link to error documentation

Never Expose Internal Details

Gateway error responses must never expose internal service names, stack traces, or infrastructure details. A 502 should say "Service temporarily unavailable" — not "Connection refused to order-service-v2.internal:8080". Internal details help attackers map your infrastructure.

Logging

The gateway sees every request — making it the single best place for access logging. Structured JSON logs with consistent fields enable powerful querying and alerting.

structured-access-log.jsonjson

{
  "timestamp": "2024-03-01T12:00:00.123Z",
  "request_id": "req_abc123",
  "method": "POST",
  "path": "/api/v1/orders",
  "status": 201,
  "latency_ms": 145,
  "upstream_latency_ms": 132,
  "client_ip": "203.0.113.42",
  "user_agent": "MyApp/2.1.0",
  "consumer_id": "partner-acme",
  "user_id": "usr_xyz789",
  "route": "orders-create",
  "upstream": "order-service:8080",
  "request_size": 1024,
  "response_size": 256,
  "rate_limit_remaining": 87,
  "tls_version": "TLSv1.3"
}

What NOT to Log

❌Authorization headers — contains tokens/keys (log consumer_id instead)
❌Request/response bodies — PII risk, storage cost (log size only)
❌Passwords or secrets — even in error messages
❌Full query parameters — may contain sensitive data (log path only)
❌Internal IP addresses — security risk if logs are exposed

Log Sampling at Scale

At 100K+ requests/second, logging every request generates terabytes daily. Use sampling: log 100% of errors and slow requests (p99), 10% of normal requests, 1% of health checks. This gives you full visibility into problems while controlling storage costs. Always log 100% of 4xx and 5xx responses.

Metrics

Metrics are aggregated numerical measurements — counters, gauges, and histograms. Unlike logs (per-request), metrics give you system-wide health at a glance. The gateway should expose Prometheus-compatible metrics for dashboarding and alerting.

Metric	Type	What It Tells You
request_total	Counter	Total requests (by route, method, status)
request_duration_seconds	Histogram	Latency distribution (p50, p95, p99)
active_connections	Gauge	Current open connections
upstream_request_total	Counter	Requests sent to each upstream
upstream_response_time	Histogram	Backend latency (separate from gateway overhead)
rate_limit_hits_total	Counter	How often rate limits are triggered
circuit_breaker_state	Gauge	Current state per upstream (0=closed, 1=open)

prometheus-metrics-endpoint.txtbash

# Gateway exposes /metrics endpoint for Prometheus scraping

# RED metrics (Rate, Errors, Duration) — the golden signals
gateway_requests_total{route="orders-create",method="POST",status="201"} 15234
gateway_requests_total{route="orders-create",method="POST",status="500"} 12
gateway_request_duration_seconds_bucket{route="orders-create",le="0.1"} 14500
gateway_request_duration_seconds_bucket{route="orders-create",le="0.5"} 15100
gateway_request_duration_seconds_bucket{route="orders-create",le="1.0"} 15230

# Upstream health
gateway_upstream_healthy{upstream="order-service"} 3
gateway_upstream_unhealthy{upstream="order-service"} 0

# Rate limiting
gateway_rate_limit_exceeded_total{consumer="partner-acme"} 42

# Connection pool
gateway_upstream_connections_active{upstream="order-service"} 67
gateway_upstream_connections_idle{upstream="order-service"} 33

The Four Golden Signals

Monitor these four signals for every route: (1) Latency — how long requests take (p50, p95, p99). (2) Traffic — requests per second. (3) Errors — percentage of 5xx responses. (4) Saturation — how full your connection pools and rate limit buckets are. Alert when any signal deviates from baseline.

Distributed Tracing

The gateway is the trace root — it generates the initial trace ID and span for every incoming request. As the request flows through downstream services, each service adds its own span. The result is a complete picture of the request's journey through your system.

Concept	Description
Trace	The complete journey of a request across all services
Span	A single operation within a trace (gateway processing, DB query, etc.)
Trace ID	Unique identifier for the entire trace (propagated to all services)
Span ID	Unique identifier for a single span within the trace
Parent Span ID	Links child spans to their parent (creates the tree)
W3C Trace Context	Standard headers: traceparent, tracestate

w3c-trace-context-headers.txtbash

# W3C Trace Context — standard propagation headers

# Gateway generates trace root:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
#            version-trace_id(32 hex)-parent_span_id(16 hex)-flags

# Gateway forwards to upstream with its span as parent:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-d7ad39c1a2e5b8f2-01

# Each service:
# 1. Extracts trace_id from incoming traceparent
# 2. Creates new span_id for its own work
# 3. Sets parent_span_id to the incoming span_id
# 4. Forwards updated traceparent to next service

# Optional vendor-specific context:
tracestate: kong=s:1234,dd=s:5678

📦

The Package Tracking Number

Distributed tracing is like a package tracking number. When you ship a package (request), it gets a tracking ID at the first facility (gateway). As it moves through sorting centers (services), each facility scans it and adds a timestamp. You can see the complete journey: where it went, how long each stop took, and where it got stuck. The trace ID is your tracking number.

Sampling for Production

Tracing every request in production is expensive (storage + network overhead). Use head-based sampling (decide at the gateway: trace 1% of requests) or tail-based sampling (collect all spans, keep only interesting traces — errors, slow requests). Always trace 100% of error responses.

Interview Questions

Q:Walk through what happens when a request hits an API Gateway.

A: 1. TLS termination (decrypt). 2. Parse HTTP request. 3. Generate/extract request ID. 4. IP filtering check. 5. Rate limit check. 6. Authentication (validate token). 7. Authorization (check role/scope). 8. Request validation (schema check). 9. Request transformation (headers, URL rewrite). 10. Route matching. 11. Load balance to upstream instance. 12. Forward request with identity headers. 13. Receive upstream response. 14. Response transformation. 15. Cache if applicable. 16. Record metrics + log. 17. Return to client. Each stage can short-circuit with an error response.

Q:Why should rate limiting come before authentication in the pipeline?

A: Authentication is computationally expensive (JWT signature verification, token introspection). Rate limiting is a cheap counter check. If an attacker floods you with invalid tokens, you want to reject at the rate limiter (O(1) Redis lookup) before spending CPU on cryptographic verification. This prevents auth-based DDoS where the attack vector is expensive token validation.

Q:How do you handle a 504 Gateway Timeout in production?

A: Immediate: return a clear error to the client with Retry-After header. Investigation: check upstream health (is the service overloaded?), check timeout configuration (is 30s too short for this endpoint?), check connection pool exhaustion. Prevention: set appropriate timeouts per route (search: 5s, report generation: 60s), implement circuit breakers, add request queuing for slow endpoints, and ensure health checks detect slow services.

Q:What's the difference between gateway latency and upstream latency, and why does it matter?

A: Gateway latency = total time from request received to response sent. Upstream latency = time spent waiting for the backend service. The difference (gateway overhead) is the time spent on TLS, auth, rate limiting, transformation. If gateway_latency >> upstream_latency, your gateway is the bottleneck (too many plugins, heavy transformation). If upstream_latency is high, the backend service needs optimization. Track both separately.

Q:How does the gateway enable distributed tracing across microservices?

A: The gateway is the trace root: (1) Generate a unique trace ID for each incoming request. (2) Create the first span (gateway processing). (3) Propagate trace context to upstream via W3C traceparent header. (4) Each downstream service extracts the trace ID, creates its own spans, and propagates further. (5) All spans are collected by a tracing backend (Jaeger, Zipkin). The gateway ensures every request has a trace ID — without it, you can't correlate logs across services.

Common Mistakes

⚠️

Logging request/response bodies

Logging full request and response bodies for debugging — exposing PII, tokens, and sensitive data in log storage.

✅Log metadata only: method, path, status, latency, consumer_id, request_size. Never log bodies, Authorization headers, or cookies. For debugging specific issues, enable body logging temporarily for a single consumer with explicit retention policies.

⚠️

Same timeout for all routes

Setting a global 60-second timeout for all upstream calls — fast endpoints wait too long on failure, slow endpoints get killed prematurely.

✅Set per-route timeouts based on expected latency: health checks (2s), CRUD operations (10s), search (5s), report generation (120s). A fast endpoint with a 60s timeout means clients wait a full minute before learning the service is down.

⚠️

Retrying non-idempotent requests

Configuring the gateway to retry all failed requests including POST, PUT, and DELETE — causing duplicate orders, double charges, or data corruption.

✅Only retry idempotent methods (GET, HEAD, OPTIONS) by default. For POST/PUT, only retry if the client provides an Idempotency-Key header and the upstream supports idempotency. Never retry on 4xx errors — those won't succeed on retry.

⚠️

No request ID correlation

Not generating or propagating a unique request ID — making it impossible to trace a client's issue across gateway logs, service logs, and database queries.

✅Generate a UUID request ID at the gateway (or accept X-Request-ID from the client). Propagate it to all upstream services. Include it in every log entry, error response, and trace span. When a client reports an issue, the request ID is your search key across all systems.