Core Responsibilities
The capabilities that justify the gateway's existence — routing, load balancing, SSL termination, protocol translation, and request/response transformation.
Table of Contents
What is an API Gateway
An API Gateway is a single entry point that sits between external clients and your internal services. It accepts all incoming API requests, applies cross-cutting policies (auth, rate limiting, logging), and routes them to the appropriate backend service. Clients never talk directly to your microservices — they talk to the gateway.
The Hotel Front Desk
An API Gateway is like a hotel front desk. Guests (clients) don't wander the hallways looking for housekeeping, room service, or maintenance. They go to the front desk, which verifies their identity (auth), checks if they're allowed (authorization), and routes their request to the right department. The departments don't need to handle check-in or verify room keys — the front desk already did that.
The Three Things a Gateway Does Well
Core Gateway Functions
- ✅Traffic management — routing, load balancing, rate limiting, circuit breaking
- ✅Security boundary — authentication, authorization, TLS termination, IP filtering
- ✅API lifecycle — versioning, transformation, documentation, monitoring
What a Gateway is NOT
What Does Not Belong in a Gateway
- ❌Business logic — never put domain rules in the gateway
- ❌Data storage — the gateway should be stateless (except caching)
- ❌Service-to-service communication — that's a service mesh concern (east-west)
- ❌Heavy computation — transformation should be lightweight, not CPU-intensive
North-South vs East-West Traffic
The gateway handles north-south traffic — requests from external clients entering your system. East-west traffic (service-to-service communication within your cluster) is handled by a service mesh (Istio, Linkerd) or direct calls. Conflating the two leads to the gateway becoming a bottleneck for internal communication.
| Concept | Gateway (North-South) | Service Mesh (East-West) |
|---|---|---|
| Traffic direction | External → Internal | Internal → Internal |
| Clients | Mobile apps, browsers, partners | Microservices talking to each other |
| Auth model | API keys, JWT, OAuth | mTLS, SPIFFE identities |
| Typical tool | Kong, AWS API Gateway | Istio, Linkerd, Consul Connect |
Request Routing
Routing is the gateway's most fundamental job — matching an incoming request to the correct upstream service. Modern gateways support multiple routing dimensions that can be combined.
| Routing Type | Match On | Example |
|---|---|---|
| Path-based | URL path prefix or exact match | /api/users/* → user-service |
| Host-based | Request Host header | api.example.com → api-service |
| Header-based | Custom header value | X-Version: v2 → service-v2 |
| Method-based | HTTP method | GET /orders → read-replica, POST /orders → primary |
| Weighted | Percentage split | 90% → stable, 10% → canary |
| Query param | URL query parameters | ?region=eu → eu-cluster |
# Kong declarative routing configuration _format_version: "3.0" services: - name: user-service url: http://user-service:8080 routes: - name: users-route paths: - /api/v1/users methods: - GET - POST strip_path: true - name: order-service url: http://order-service:8080 routes: - name: orders-route paths: - /api/v1/orders headers: X-Region: - us-east - us-west
Dynamic Routing
Static routing maps paths to services at configuration time. Dynamic routing resolves the target at request time — based on JWT claims, database lookups, or service registry queries. This enables tenant-specific routing, feature flags, and A/B testing without redeploying the gateway.
Route Priority
When multiple routes could match, gateways use priority rules: exact path beats prefix, longer prefix beats shorter, specific headers beat wildcards. Understand your gateway's priority model — ambiguous routing is a common source of production incidents.
Load Balancing
Once the gateway knows which service to route to, it must choose which instance of that service receives the request. The load balancing algorithm determines fairness, latency, and resilience.
| Algorithm | How It Works | Best For |
|---|---|---|
| Round Robin | Rotate through instances sequentially | Homogeneous instances, simple workloads |
| Weighted Round Robin | Rotate but send more to higher-weight instances | Mixed instance sizes (2x CPU = 2x weight) |
| Least Connections | Send to instance with fewest active requests | Variable request duration (long-running queries) |
| IP Hash | Hash client IP to pick consistent instance | Session affinity without cookies |
| Random | Pick a random instance | Large pools where simplicity wins |
| Least Latency | Send to instance with lowest response time | Latency-sensitive APIs |
upstream user_service { # Least connections — best for variable response times least_conn; server user-svc-1:8080 weight=3; server user-svc-2:8080 weight=2; server user-svc-3:8080 weight=1; # Health check — mark as down after 3 failures server user-svc-4:8080 max_fails=3 fail_timeout=30s; } server { location /api/users { proxy_pass http://user_service; proxy_next_upstream error timeout http_502 http_503; proxy_next_upstream_tries 2; } }
Health-Check-Aware Balancing
A load balancer that doesn't health-check is a traffic distributor that sends requests to dead instances. Active health checks (periodic pings) detect failures before clients hit them. Passive health checks (monitoring response codes) react after failures. Use both — active for fast detection, passive for catching intermittent issues.
SSL/TLS Termination
TLS termination at the gateway means the gateway handles the expensive TLS handshake and decryption. Backend services receive plain HTTP, eliminating certificate management from every service and offloading CPU-intensive cryptographic operations.
| Mode | Description | Use When |
|---|---|---|
| TLS Termination | Gateway decrypts, forwards plain HTTP to backends | Most common — internal network is trusted |
| TLS Passthrough | Gateway forwards encrypted traffic without decrypting | Gateway can't inspect traffic, end-to-end encryption required |
| Re-encryption | Gateway decrypts, inspects, re-encrypts to backend | Need inspection + backend encryption (compliance) |
| SNI Routing | Route based on TLS SNI header without decrypting | Multi-tenant with passthrough per domain |
server { listen 443 ssl http2; server_name api.example.com; # Certificate management ssl_certificate /etc/ssl/certs/api.example.com.pem; ssl_certificate_key /etc/ssl/private/api.example.com.key; # Modern TLS configuration ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256; ssl_prefer_server_ciphers off; # OCSP stapling for faster handshakes ssl_stapling on; ssl_stapling_verify on; # Forward to backend as plain HTTP location / { proxy_pass http://backend_service; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header X-Forwarded-For $remote_addr; } }
Certificate Management at Scale
With dozens of domains, manual cert management is unsustainable. Use automated certificate management: Let's Encrypt with ACME protocol, or cloud-managed certificates (AWS ACM, GCP managed certs). The gateway should auto-renew certificates without downtime — Traefik and Caddy do this natively.
Protocol Translation
The gateway can translate between protocols — exposing a REST API to external clients while backends communicate via gRPC, or upgrading HTTP/1.1 connections to HTTP/2 for multiplexed backend communication.
| Translation | Client Sees | Backend Uses | Why |
|---|---|---|---|
| HTTP/1.1 → HTTP/2 | Standard HTTP/1.1 | Multiplexed HTTP/2 | Connection efficiency, header compression |
| REST → gRPC | JSON over HTTP | Protobuf over gRPC | Performance for internal services |
| WebSocket Upgrade | HTTP → WS handshake | WebSocket connection | Bidirectional real-time communication |
| HTTP → AMQP | Synchronous HTTP POST | Async message queue | Decouple client from async processing |
| gRPC-Web → gRPC | gRPC-Web (browser) | Native gRPC | Browser clients can't use native gRPC |
# Envoy gRPC-JSON transcoding filter http_filters: - name: envoy.filters.http.grpc_json_transcoder typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_json_transcoder.v3.GrpcJsonTranscoder proto_descriptor: "/etc/envoy/proto.pb" services: - user.UserService print_options: add_whitespace: true always_print_primitive_fields: true # Client sends: POST /v1/users {"name": "Alice"} # Gateway translates to: gRPC UserService.CreateUser(CreateUserRequest{name: "Alice"}) # Backend responds with protobuf, gateway translates back to JSON
The Universal Translator
Protocol translation makes the gateway like a UN interpreter. External clients speak REST (the common language), but internally your services speak gRPC (faster, typed). The gateway translates in both directions so neither side needs to learn the other's language. Clients get simplicity, services get performance.
Request/Response Transformation
Transformation modifies requests before they reach backends and responses before they reach clients. This decouples the external API contract from internal service interfaces.
| Transformation | Direction | Example |
|---|---|---|
| Header injection | Request | Add X-Request-ID, X-User-ID from JWT |
| Header removal | Response | Strip internal headers (X-Powered-By, Server) |
| Body transformation | Both | Rename fields, flatten nested objects |
| URL rewriting | Request | /api/v1/users → /users (strip prefix) |
| Response filtering | Response | Remove internal fields from public API |
| Aggregation | Response | Merge responses from multiple services |
plugins: - name: request-transformer config: add: headers: - "X-Request-ID:$(uuid)" - "X-Forwarded-Host:$(headers.host)" remove: headers: - Cookie rename: headers: - "Authorization:X-Original-Auth" - name: response-transformer config: remove: headers: - X-Powered-By - Server - X-Internal-Trace add: headers: - "X-Response-Time:$(latency)" remove: json: - internal_id - debug_info
Keep Transformations Lightweight
The gateway processes every request. Heavy transformations (large body rewrites, complex JSON manipulation) add latency to every call. If you need significant transformation, consider a dedicated BFF service behind the gateway. The gateway should handle header manipulation and simple field filtering — not complex business logic reshaping.
Gateway vs Related Concepts
The API Gateway overlaps with several other infrastructure components. Understanding the boundaries prevents architectural confusion.
| Component | Primary Role | Overlap with Gateway | Key Difference |
|---|---|---|---|
| Reverse Proxy | Forward requests to backends | Routing, TLS, load balancing | No API-specific features (auth, rate limiting, versioning) |
| Load Balancer | Distribute traffic across instances | Traffic distribution | Layer 4 (TCP) vs Layer 7 (HTTP) — no request inspection |
| Service Mesh | Service-to-service communication | mTLS, observability, retries | East-west traffic, sidecar model, not client-facing |
| CDN | Cache and serve static content at edge | Caching, TLS termination | Optimized for static assets, not API logic |
| WAF | Block malicious requests | Security filtering | Focused on attack patterns (SQLi, XSS), not API management |
| Ingress Controller | Route external traffic into Kubernetes | Routing, TLS | Kubernetes-specific, often backed by a gateway (NGINX, Envoy) |
They're Complementary, Not Competing
In production, you often use several together: CDN → WAF → API Gateway → Service Mesh. The CDN handles static content and edge caching. The WAF blocks attacks. The gateway handles API-specific concerns. The service mesh handles internal communication. Each layer does what it does best.
Interview Questions
Q:What's the difference between an API Gateway and a reverse proxy?
A: A reverse proxy forwards requests to backend servers and handles basic concerns like TLS and load balancing. An API Gateway does everything a reverse proxy does PLUS API-specific features: authentication, rate limiting, request/response transformation, API versioning, developer portal, and analytics. NGINX is a reverse proxy; Kong (built on NGINX) is an API Gateway. The gateway is a superset.
Q:Why terminate TLS at the gateway instead of at each service?
A: Three reasons: (1) Certificate management — manage certs in one place instead of N services. (2) Performance — TLS handshakes are CPU-intensive; offload to dedicated gateway hardware. (3) Inspection — the gateway needs to read request headers/body for routing, auth, and rate limiting. With TLS passthrough, the gateway can't inspect traffic. The trade-off: internal traffic is unencrypted unless you re-encrypt.
Q:How does an API Gateway handle the single point of failure problem?
A: Deploy multiple gateway instances behind a network load balancer (L4). The gateway itself should be stateless — all state (rate limit counters, sessions) lives in external stores (Redis). This allows horizontal scaling and instant failover. Use active-active deployment across availability zones. If one gateway instance dies, the NLB routes to healthy instances with zero downtime.
Q:When would you use TLS passthrough instead of termination?
A: Use passthrough when: (1) Compliance requires end-to-end encryption with no intermediary decryption. (2) The backend needs to verify the client certificate directly (mTLS where the backend is the trust anchor). (3) You don't need the gateway to inspect request content. The trade-off: the gateway can only route based on SNI (hostname) — it can't do path-based routing, header inspection, or body transformation.
Q:Explain how protocol translation at the gateway benefits a microservices architecture.
A: External clients use REST/JSON (simple, universal, browser-friendly). Internal services use gRPC (fast, typed, streaming). The gateway translates between them — clients get a simple API, services get performance. This also means you can change internal protocols without breaking external clients. The gateway absorbs the protocol mismatch, acting as an anti-corruption layer between external and internal contracts.
Common Mistakes
Putting business logic in the gateway
Adding domain-specific validation, data enrichment, or workflow orchestration to gateway plugins.
✅The gateway handles cross-cutting concerns only: auth, rate limiting, routing, transformation. Business logic belongs in services. If you need request enrichment, use a BFF service behind the gateway.
Single gateway instance with no redundancy
Running one gateway instance because 'it's just a proxy' — creating a single point of failure for all traffic.
✅Deploy at minimum 2 instances across availability zones behind a network load balancer. The gateway is the most critical piece of infrastructure — if it goes down, everything goes down.
Using the gateway for east-west traffic
Routing all service-to-service calls through the API Gateway, creating a bottleneck and unnecessary hop.
✅The gateway handles north-south (external → internal) traffic only. For east-west (service-to-service), use direct calls or a service mesh. Internal services don't need the gateway's auth or rate limiting — they have their own trust model.
Heavy body transformation in the gateway
Performing complex JSON restructuring, data aggregation from multiple sources, or XML-to-JSON conversion for every request in the gateway layer.
✅Keep gateway transformations lightweight: header manipulation, field filtering, simple renames. For complex transformations, use a dedicated BFF or transformation service. The gateway processes every request — heavy computation here adds latency to everything.