KongAWS API GatewayEnvoyNGINXTraefikApigeeHAScaling

Implementations & Operations

Popular API Gateway implementations compared — Kong, AWS API Gateway, Envoy, NGINX, Traefik — plus deployment, HA, and production operations.

40 min read9 sections

Kong

Kong is the most popular open-source API Gateway. Built on NGINX and OpenResty (LuaJIT), it combines NGINX's raw performance with a plugin architecture that adds API management features. It can run with a PostgreSQL/Cassandra database (traditional mode) or DB-less with declarative YAML configuration.

Aspect	Details
Core	NGINX + OpenResty (LuaJIT) — handles millions of requests/sec
Plugin system	100+ plugins: auth, rate limiting, logging, transformation
Configuration	DB-backed (PostgreSQL/Cassandra) or DB-less (declarative YAML)
Admin API	RESTful API for dynamic configuration changes
Kubernetes	Kong Ingress Controller (KIC) — native K8s integration
Enterprise	Kong Enterprise adds: Dev Portal, RBAC, OIDC, Vitals analytics

kong-dbless-config.yamlyaml

# Kong DB-less declarative configuration
_format_version: "3.0"

services:
  - name: user-service
    url: http://user-service:8080
    connect_timeout: 5000
    read_timeout: 30000
    retries: 2
    routes:
      - name: users-route
        paths:
          - /api/v1/users
        methods:
          - GET
          - POST
          - PUT
        strip_path: true
        plugins:
          - name: rate-limiting
            config:
              minute: 100
              policy: redis
              redis_host: redis
          - name: jwt
            config:
              claims_to_verify:
                - exp

consumers:
  - username: mobile-app
    plugins:
      - name: rate-limiting
        config:
          minute: 5000  # Higher limit for mobile app

DB-less Mode for Kubernetes

In Kubernetes, DB-less mode is preferred. Configuration lives in declarative YAML (stored in Git), applied via Kong Ingress Controller or deck sync. No database dependency means simpler operations, faster startup, and GitOps-friendly workflows. The trade-off: no Admin API for dynamic changes — all changes go through config files.

AWS API Gateway

AWS API Gateway is a fully managed service — no infrastructure to operate. It comes in two flavors: REST API (full-featured, more expensive) and HTTP API (simpler, cheaper, faster). Both integrate deeply with AWS services (Lambda, IAM, Cognito).

Aspect	REST API	HTTP API
Price	$3.50/million requests	$1.00/million requests
Latency	Higher (more features)	Lower (optimized path)
Auth	IAM, Cognito, Lambda authorizer, API keys	IAM, Cognito, JWT authorizer
Throttling	Per-method, per-stage, per-key	Per-route, per-stage
WebSocket	Separate WebSocket API type	Not supported
Caching	Built-in response caching	Not available
Transformation	VTL request/response mapping	Simple parameter mapping
Best for	Full API management, complex transformations	Simple proxy, Lambda backends, cost-sensitive

aws-cdk-api-gateway.jsonjson

// AWS CDK — HTTP API with Lambda integration
{
  "Type": "AWS::ApiGatewayV2::Api",
  "Properties": {
    "Name": "order-api",
    "ProtocolType": "HTTP",
    "CorsConfiguration": {
      "AllowOrigins": ["https://app.example.com"],
      "AllowMethods": ["GET", "POST", "PUT", "DELETE"],
      "AllowHeaders": ["Authorization", "Content-Type"],
      "MaxAge": 86400
    }
  }
}

// Route with JWT authorizer
{
  "Type": "AWS::ApiGatewayV2::Route",
  "Properties": {
    "ApiId": {"Ref": "OrderApi"},
    "RouteKey": "POST /orders",
    "AuthorizationType": "JWT",
    "AuthorizerId": {"Ref": "CognitoAuthorizer"},
    "Target": {"Fn::Join": ["/", ["integrations", {"Ref": "OrderLambdaIntegration"}]]}
  }
}

AWS API Gateway Limitations

❌29-second timeout maximum — not suitable for long-running operations
❌10 MB payload limit — large file uploads need presigned S3 URLs
❌No WebSocket support on HTTP API type
❌Vendor lock-in — deeply tied to AWS ecosystem
❌Cold start latency when backed by Lambda
❌Limited plugin/extension model compared to Kong or Envoy

Envoy & NGINX

Envoy Proxy

Envoy is a modern, high-performance proxy designed for cloud-native architectures. It's the data plane for Istio and the foundation of many API gateways (Ambassador/Emissary, Gloo). Its killer feature is dynamic configuration via xDS APIs — no restarts needed for config changes.

Envoy Feature	Description
xDS API	Dynamic configuration — routes, clusters, listeners updated without restart
gRPC-native	First-class gRPC support including streaming and transcoding
Observability	Built-in stats, tracing (Zipkin/Jaeger), access logging
Filters	Extensible filter chain — Lua, Wasm, external processing
HTTP/2 & HTTP/3	Full HTTP/2 support, experimental HTTP/3 (QUIC)
Service mesh	Foundation of Istio, used as sidecar proxy

NGINX

NGINX is the battle-tested workhorse — the most deployed reverse proxy in the world. As a gateway, it's extremely fast and stable but configuration-driven (static config files, reload required for changes). NGINX Plus adds dynamic configuration, health checks, and a dashboard.

envoy-dynamic-config.yamlyaml

# Envoy bootstrap — connects to xDS control plane for dynamic config
admin:
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 9901

dynamic_resources:
  lds_config:  # Listener Discovery Service
    api_config_source:
      api_type: GRPC
      grpc_services:
        - envoy_grpc:
            cluster_name: xds-cluster
  cds_config:  # Cluster Discovery Service
    api_config_source:
      api_type: GRPC
      grpc_services:
        - envoy_grpc:
            cluster_name: xds-cluster

static_resources:
  clusters:
    - name: xds-cluster
      connect_timeout: 5s
      type: STRICT_DNS
      load_assignment:
        cluster_name: xds-cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: control-plane
                      port_value: 18000

Envoy vs NGINX — When to Choose

Choose Envoy when: you need dynamic configuration (xDS), gRPC support, service mesh integration, or Wasm extensibility. Choose NGINX when: you need raw performance for simple proxying, have existing NGINX expertise, or want the simplest possible configuration. Envoy is more capable but more complex; NGINX is simpler but less dynamic.

Traefik & Others

Traefik

Traefik is a Kubernetes-native reverse proxy and API gateway. Its killer feature is automatic service discovery and automatic Let's Encrypt certificate management. It watches Kubernetes Ingress resources and configures itself — zero manual route configuration.

Gateway	Key Strength	Best For
Traefik	Auto-discovery, auto-TLS, Kubernetes-native	Kubernetes clusters, small-medium APIs
Ambassador/Emissary	Envoy-based, Kubernetes CRDs, developer-friendly	Kubernetes with Envoy features
Azure API Management	Full lifecycle management, Azure integration	Azure-native architectures
Apigee (Google)	Enterprise API management, analytics, monetization	Large enterprises, API-as-product
Tyk	Open-source, Go-based, GraphQL-native	GraphQL APIs, open-source preference
KrakenD	Ultra-high performance, stateless, no DB	Performance-critical, simple routing

traefik-kubernetes-ingress.yamlyaml

# Traefik auto-discovers routes from Kubernetes Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  annotations:
    traefik.ingress.kubernetes.io/router.middlewares:
      default-rate-limit@kubernetescrd,
      default-auth@kubernetescrd
spec:
  tls:
    - hosts:
        - api.example.com
      secretName: api-tls  # Auto-managed by cert-manager
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /api/users
            pathType: Prefix
            backend:
              service:
                name: user-service
                port:
                  number: 8080
          - path: /api/orders
            pathType: Prefix
            backend:
              service:
                name: order-service
                port:
                  number: 8080
# Traefik automatically picks this up — no restart needed

Choosing a Gateway

For most teams: Kong (full API management, large plugin ecosystem) or AWS API Gateway (fully managed, serverless). For Kubernetes-native: Traefik (simplest) or Ambassador (Envoy-powered). For service mesh: Envoy (Istio data plane). For enterprise API-as-product: Apigee or Azure APIM. Start simple — you can always migrate to a more complex solution when you outgrow the simple one.

Comparison Table

A feature comparison across the major API Gateway implementations to help you choose the right tool for your requirements.

Feature	Kong	AWS API GW	Envoy	NGINX	Traefik
Open Source	✅ (Apache 2.0)	❌ (Managed)	✅ (Apache 2.0)	✅ (BSD) / Plus	✅ (MIT)
Path Routing	✅	✅	✅	✅	✅
Header Routing	✅	✅	✅	✅	✅
JWT Auth	✅ Plugin	✅ Built-in	✅ Filter	⚠️ Plus only	✅ Middleware
Rate Limiting	✅ Plugin	✅ Built-in	✅ Filter	✅ Built-in	✅ Middleware
gRPC	✅	⚠️ Limited	✅ Native	✅	✅
WebSocket	✅	✅ (REST API)	✅	✅	✅
Dynamic Config	✅ Admin API	✅ (Managed)	✅ xDS API	❌ Reload	✅ Auto-discovery
Kubernetes	✅ KIC	⚠️ External	✅ (Istio)	✅ Ingress	✅ Native
Scaling Model	Horizontal	Auto (managed)	Horizontal	Horizontal	Horizontal

Decision Framework

Ask these questions: (1) Do you need full API management (portal, plans, analytics)? → Kong Enterprise or Apigee. (2) Do you want zero ops? → AWS API Gateway. (3) Do you need gRPC-native + service mesh? → Envoy. (4) Do you want simplest Kubernetes setup? → Traefik. (5) Do you need maximum raw performance with minimal features? → NGINX or KrakenD.

High Availability & Scaling

The gateway is the most critical infrastructure component — if it goes down, everything goes down. High availability is non-negotiable.

Principle	Implementation
No single point of failure	Minimum 2 instances across availability zones
Stateless gateway	All state in external stores (Redis, DB) — any instance can serve any request
Active-active	All instances serve traffic simultaneously (not active-passive)
Health-checked	Network load balancer health-checks gateway instances
Auto-scaling	Scale gateway instances based on CPU/connections/request rate
Graceful shutdown	Drain connections before terminating an instance

kubernetes-gateway-ha.yamlyaml

# Kubernetes deployment for HA gateway
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
spec:
  replicas: 3  # Minimum 3 for HA
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0  # Zero downtime during updates
  template:
    spec:
      # Spread across availability zones
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
      # Anti-affinity — don't schedule on same node
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: api-gateway
              topologyKey: kubernetes.io/hostname
      containers:
        - name: gateway
          resources:
            requests:
              cpu: "2"
              memory: "4Gi"
            limits:
              cpu: "4"
              memory: "8Gi"
          readinessProbe:
            httpGet:
              path: /health
              port: 8001
            initialDelaySeconds: 5
            periodSeconds: 5
          lifecycle:
            preStop:
              exec:
                command: ["sleep", "15"]  # Drain connections

🏥

The Hospital Emergency Room

Your gateway should be like a hospital ER — always open, always staffed, with backup generators and redundant systems. You don't have one doctor on call; you have a team across shifts. If one doctor gets sick, others cover. The gateway needs the same resilience: multiple instances, across zones, with automatic failover. Downtime is not an option for the front door of your system.

Capacity Planning

Size your gateway for 3x normal peak traffic. Why 3x? (1) Normal peak handles expected load. (2) 2x handles a traffic spike or one AZ going down. (3) 3x handles a spike during a partial outage. If your gateway can't absorb unexpected load, it becomes the bottleneck that causes the outage instead of preventing it.

Configuration & Zero-Downtime Deployments

Gateway configuration changes (new routes, updated rate limits, plugin changes) must be applied without dropping requests. Zero-downtime configuration updates are essential for a component that handles all traffic.

Approach	How It Works	Downtime
Hot reload	Gateway reloads config without restarting (NGINX: nginx -s reload)	Zero — existing connections maintained
Dynamic API	Push changes via Admin API (Kong, Envoy xDS)	Zero — applied immediately
Rolling update	Deploy new config to instances one at a time	Zero — if done correctly with drain
GitOps	Config in Git, CI/CD applies changes automatically	Zero — uses rolling update underneath
Blue-green config	Deploy new config to green, switch traffic	Zero — atomic switch

gitops-gateway-workflow.yamlyaml

# GitOps workflow for gateway configuration
# 1. Developer pushes config change to Git
# 2. CI validates config (syntax, schema, dry-run)
# 3. CD applies to staging gateway
# 4. Automated tests verify staging
# 5. CD applies to production gateway (rolling)

# CI validation step
validate:
  script:
    # Kong deck validates declarative config
    - deck validate --state kong.yaml
    # Dry-run against staging
    - deck diff --state kong.yaml --kong-addr http://staging-gateway:8001

# CD deployment step
deploy:
  script:
    # Apply config with zero downtime
    - deck sync --state kong.yaml --kong-addr http://gateway:8001
    # Or for Kubernetes:
    - kubectl apply -f gateway-config.yaml
    # Rolling update handles the rest

Graceful Shutdown

graceful-shutdown-sequence.shbash

# Graceful shutdown sequence for gateway instance:

# 1. Remove from load balancer (stop receiving new connections)
#    - Kubernetes: pod enters Terminating state
#    - NLB: health check fails → deregisters target

# 2. Wait for in-flight requests to complete
#    - preStop hook: sleep 15 (allow LB to deregister)
#    - Gateway drains: finish active requests (up to 30s)

# 3. Close idle connections
#    - Send Connection: close on keep-alive connections

# 4. Terminate process
#    - SIGTERM → graceful shutdown
#    - SIGKILL after grace period (30s) if still running

# Key: the sleep in preStop gives the load balancer time to
# stop sending new requests before the gateway starts draining.

Config Validation in CI

Never apply gateway config directly to production. Always: (1) Validate syntax in CI. (2) Dry-run against staging (deck diff, envoy validate). (3) Apply to staging and run integration tests. (4) Apply to production with rolling update. A bad gateway config (invalid route, broken plugin) can take down all traffic instantly. Treat gateway config with the same rigor as application code.

Interview Questions

Q:Compare Kong and AWS API Gateway. When would you choose each?

A: Kong: open-source, self-managed, runs anywhere (cloud, on-prem, Kubernetes), extensive plugin ecosystem, full control over configuration and scaling. AWS API Gateway: fully managed (zero ops), deep AWS integration (Lambda, IAM, Cognito), pay-per-request pricing, but vendor lock-in and limited customization. Choose Kong when: you need portability, custom plugins, or run on-prem. Choose AWS when: you're all-in on AWS, want zero ops, and your API patterns fit within its limitations (29s timeout, 10MB payload).

Q:How do you achieve zero-downtime gateway deployments?

A: Multiple layers: (1) Stateless gateway — no local state, all shared state in Redis/DB. (2) Rolling updates — update one instance at a time, never all simultaneously. (3) Graceful shutdown — drain in-flight requests before terminating (preStop hook + SIGTERM handling). (4) Health check integration — NLB stops sending traffic to terminating instances. (5) maxUnavailable: 0 in Kubernetes — never reduce below desired replica count during update. (6) Config validation in CI — catch bad configs before they reach production.

Q:Why is Envoy's xDS protocol significant for API Gateways?

A: xDS (discovery services) allows Envoy to receive configuration updates dynamically via gRPC — without restarts or config file reloads. This means: (1) New routes added instantly when services deploy. (2) Upstream endpoints updated in real-time as pods scale. (3) Rate limits and policies changed without touching the proxy. (4) A control plane (Istio, custom) manages configuration centrally and pushes to all Envoy instances. This is fundamentally different from NGINX's 'edit file, reload' model and enables true GitOps and automation at scale.

Q:How would you size and scale an API Gateway for 100K requests/second?

A: Sizing: (1) Benchmark single instance throughput (Kong on modern hardware: ~30-50K req/s). (2) Need 3-4 instances for 100K req/s at normal load. (3) Plan for 3x headroom: 9-12 instances. (4) Each instance: 4 CPU cores, 8GB RAM minimum. Scaling: (1) Horizontal auto-scaling on CPU utilization (target 60%). (2) Spread across 3 AZs. (3) Network load balancer in front (L4, not L7 — avoid double processing). (4) Connection pooling to upstreams (avoid connection storms). (5) Monitor: request rate, latency p99, connection count, CPU.

Q:What's the operational difference between DB-backed and DB-less gateway modes?

A: DB-backed (Kong + PostgreSQL): Admin API for dynamic changes, multiple nodes sync via DB, good for teams that need runtime flexibility. Operational cost: must manage and HA the database. DB-less (declarative YAML): config in Git, applied via CI/CD, no database dependency, faster startup, GitOps-friendly. Operational cost: no dynamic changes — all updates go through Git + deploy pipeline. Choose DB-less for Kubernetes (GitOps natural fit). Choose DB-backed for teams that need to make quick runtime changes without a deploy.

Common Mistakes

⚠️

No graceful shutdown during deployments

Gateway instances are killed immediately during rolling updates — dropping in-flight requests and returning 502 errors to clients.

✅Implement graceful shutdown: (1) preStop hook with sleep (15s) to allow LB deregistration. (2) Handle SIGTERM by stopping new connections and draining existing ones. (3) Set terminationGracePeriodSeconds high enough for long requests to complete. (4) Set maxUnavailable: 0 to never reduce capacity during updates.

⚠️

Gateway config applied directly to production

Pushing gateway configuration changes directly to production without validation — a typo in a route regex takes down all traffic.

✅Treat gateway config like code: validate in CI (syntax + schema), dry-run against staging, apply to staging with integration tests, then rolling deploy to production. A single bad config line can cause a total outage. Never skip validation.

⚠️

Single availability zone deployment

All gateway instances run in one AZ — an AZ outage takes down the entire API.

✅Deploy gateway instances across at minimum 2 (preferably 3) availability zones. Use topology spread constraints in Kubernetes or multi-AZ target groups in AWS. The gateway must survive a full AZ failure without degradation.

⚠️

Choosing a gateway based on features alone

Selecting the most feature-rich gateway without considering operational complexity, team expertise, or actual requirements.

✅Choose based on: (1) What you actually need today (not might need someday). (2) Team expertise — a gateway your team can't operate is worse than a simpler one they can. (3) Operational model — managed (AWS) vs self-managed (Kong). (4) Ecosystem fit — Kubernetes-native if you're on K8s. Start simple, migrate when you outgrow it.