BFFAPI CompositionSidecarCentralizedDev PortalMulti-TenantPlans

Gateway Patterns & Multi-Tenancy

How the gateway is used in practice — BFF pattern, API composition, sidecar vs centralized, developer portals, and multi-tenant API management.

40 min read9 sections

Backend for Frontend (BFF)

The BFF pattern creates a separate gateway (or gateway layer) for each client type — mobile, web, partner API. Each BFF is tailored to its client's specific needs: different response shapes, different aggregation logic, different performance characteristics.

Client	BFF Optimizations	Example
Mobile app	Smaller payloads, fewer round trips, offline-friendly	Aggregate user + orders in one call
Web SPA	Rich responses, pagination, real-time updates	Full product details with reviews
Partner API	Stable contract, webhook support, batch operations	Bulk order creation endpoint
Internal tools	Admin-level access, verbose responses, debug info	Full audit trail in responses

🍽️

The Restaurant with Multiple Menus

A single generic gateway is like a restaurant with one menu for everyone — the kids get confused by the wine list, and adults don't want the coloring page. BFF is like having a kids' menu, an adult menu, and a catering menu. Same kitchen (backend services), but each menu (BFF) is designed for its audience. The mobile BFF sends compact responses; the web BFF sends rich, nested data.

bff-gateway-architecture.yamlyaml

# Multiple BFF gateways — each tailored to its client
gateways:
  mobile-bff:
    host: mobile-api.example.com
    optimizations:
      - Aggregate multiple service calls into single response
      - Compress images to mobile-friendly sizes
      - Return minimal fields (id, name, thumbnail)
      - Support offline sync endpoints
    routes:
      - path: /home-feed
        aggregates: [user-service, order-service, recommendation-service]

  web-bff:
    host: api.example.com
    optimizations:
      - Rich responses with nested relationships
      - Server-sent events for real-time updates
      - Pagination with cursor-based navigation
    routes:
      - path: /products/:id
        aggregates: [product-service, review-service, inventory-service]

  partner-bff:
    host: partner-api.example.com
    optimizations:
      - Stable versioned contract (v1 only)
      - Webhook delivery for async events
      - Batch endpoints for bulk operations
      - Strict rate limiting per partner

When One Gateway Serves All Poorly

A single generic gateway forces compromises: mobile clients download fields they don't need, web clients make multiple round trips for data that could be aggregated, and partner APIs get response shapes designed for your internal frontend. BFF eliminates these compromises — each client gets exactly what it needs. The cost: maintaining multiple gateway layers.

API Composition / Aggregation

API composition lets the gateway call multiple backend services and merge their responses into a single response for the client. This reduces client-side round trips and simplifies frontend logic.

Pattern	Description	Latency Impact
Sequential	Call service A, then use result to call service B	Sum of all call latencies
Parallel	Call services A, B, C simultaneously, merge results	Max of all call latencies
Partial failure	Return available data, mark failed parts as null	Max of successful calls
Fallback	If primary fails, call fallback service	Primary latency + fallback if needed

composition-example.jsonjson

// Client calls: GET /api/dashboard
// Gateway composes from 3 services in parallel:

// 1. user-service: GET /users/123
// 2. order-service: GET /users/123/orders?limit=5
// 3. notification-service: GET /users/123/notifications?unread=true

// Gateway merges into single response:
{
  "user": {
    "id": "123",
    "name": "Alice",
    "plan": "pro"
  },
  "recent_orders": [
    {"id": "ord_1", "total": 49.99, "status": "delivered"},
    {"id": "ord_2", "total": 29.99, "status": "shipped"}
  ],
  "notifications": {
    "unread_count": 3,
    "items": [...]
  },
  "_meta": {
    "services_called": 3,
    "services_succeeded": 3,
    "total_latency_ms": 145
  }
}

Partial Failure Handling

When composing from multiple services, some may fail while others succeed. Don't fail the entire request because one service is down. Return available data with null/empty for failed parts and include metadata about which services failed. The client can render what's available and show a degraded experience for the missing parts.

When to Use API Composition

✅Dashboard pages that need data from 3+ services
✅Mobile clients where round-trip latency is expensive
✅Reducing N+1 API calls from the frontend
✅Aggregating data that doesn't change together (user + orders + notifications)

When NOT to Use API Composition

❌When composition requires business logic (use a dedicated service instead)
❌When the aggregated data must be transactionally consistent
❌When composition adds significant latency (sequential dependent calls)
❌When it makes the gateway a single point of coupling to all services

Request Hedging

Request hedging sends the same request to multiple upstream instances simultaneously and returns the first successful response. The other in-flight requests are cancelled. This trades extra backend load for lower tail latency.

Aspect	Retry	Hedging
Timing	Send second request AFTER first fails/times out	Send multiple requests simultaneously
Latency impact	Adds full retry latency on failure	Returns fastest response (p99 → p50)
Backend load	1x normal, 2x on failure	2-3x always (every request is duplicated)
Use case	Handling failures	Reducing tail latency for critical paths
Idempotency	Required for safety	Required — multiple instances process same request

envoy-hedging-config.yamlyaml

# Envoy request hedging — send to 2 instances, return fastest
routes:
  - match:
      prefix: "/api/search"
    route:
      cluster: search-service
      hedge_policy:
        # Send hedge request after 100ms if primary hasn't responded
        hedge_on_per_try_timeout: true
        initial_requests: 1
        additional_request_chance:
          numerator: 100  # Always hedge (100%)
          denominator: HUNDRED

      # Alternative: immediate hedging (send to 2 simultaneously)
      # Useful for latency-critical reads
      retry_policy:
        retry_on: "5xx,reset,connect-failure"
        num_retries: 1
        per_try_timeout: 200ms  # Hedge after 200ms

Only Hedge Reads

Hedging is only safe for idempotent, read-only operations. Hedging a write request means multiple instances process the same mutation — creating duplicates. Use hedging for: search queries, product lookups, user profile reads. Never hedge: order creation, payment processing, any state-changing operation.

Sidecar vs Centralized Gateway

The gateway can be deployed as a centralized cluster (all traffic flows through shared instances) or as a sidecar (one gateway instance per service). Each model has different trade-offs for latency, isolation, and operational complexity.

Aspect	Centralized Gateway	Sidecar (per-service)
Deployment	Shared cluster of gateway instances	One proxy per service pod
Network hop	Extra hop through gateway cluster	Localhost — no network hop
Failure blast radius	Gateway down = everything down	Sidecar down = one service down
Configuration	Central config for all routes	Per-service config (distributed)
Resource usage	Efficient — shared pool	Higher — one proxy per pod
Typical tool	Kong, AWS API Gateway	Envoy sidecar, Istio, Linkerd
Best for	North-south (external) traffic	East-west (internal) traffic

hybrid-architecture.yamlyaml

# Hybrid: Centralized gateway for north-south + service mesh for east-west

# External traffic path:
# Client → Centralized API Gateway → Service A
#
# Internal traffic path:
# Service A → Envoy Sidecar → Envoy Sidecar → Service B

infrastructure:
  api_gateway:
    type: centralized
    tool: Kong
    handles:
      - External client authentication
      - Rate limiting per API key
      - Request routing to services
      - API versioning and deprecation
      - Developer portal

  service_mesh:
    type: sidecar
    tool: Istio (Envoy sidecars)
    handles:
      - Service-to-service mTLS
      - Internal load balancing
      - Circuit breaking between services
      - Internal observability (traces, metrics)
      - Retry policies for internal calls

The Hybrid is the Standard

Most production architectures use both: a centralized gateway for external traffic (API management, auth, rate limiting) and a service mesh for internal traffic (mTLS, retries, observability). They're complementary, not competing. The gateway handles client-facing concerns; the mesh handles service-to-service concerns.

Gateway Offloading

Gateway offloading moves cross-cutting concerns from individual services to the gateway. Services become simpler — they trust the gateway to have already handled auth, rate limiting, CORS, and logging.

Concern	Without Gateway	With Gateway Offloading
Authentication	Every service validates JWT	Gateway validates, services trust X-User-ID
Rate limiting	Each service implements its own	Gateway enforces globally
CORS	Each service sets CORS headers	Gateway handles all CORS
Logging	Each service logs access	Gateway logs all requests centrally
TLS	Each service manages certificates	Gateway terminates TLS once
Compression	Each service compresses responses	Gateway compresses all responses

What to Offload to the Gateway

✅Authentication and coarse authorization
✅Rate limiting and throttling
✅TLS termination and certificate management
✅CORS header management
✅Request/response compression
✅Access logging and metrics collection
✅Security headers (HSTS, CSP, X-Frame-Options)
✅Request ID generation and propagation

What to Keep in Services

✅Fine-grained authorization (resource-level access control)
✅Business logic and domain validation
✅Data transformation specific to the domain
✅Service-specific rate limits (e.g., expensive operations)
✅Database access and caching of domain data

The Bypass Risk

Gateway offloading only works if services are NEVER directly accessible from outside. If a service can be reached without going through the gateway, all offloaded security (auth, rate limiting) is bypassed. Enforce network isolation: services only accept traffic from the gateway's IP range or use mTLS to verify the caller is the gateway.

Multi-Tenant API Management

In a multi-tenant system, the gateway must isolate tenants — ensuring one tenant's traffic doesn't affect another's performance or access another's data.

Isolation Level	Description	Implementation
Rate limit isolation	Each tenant has independent limits	Per-tenant rate limit counters in Redis
Routing isolation	Tenants route to different backends	Tenant-specific upstream pools
Data isolation	Tenants can't access each other's data	X-Tenant-ID header, enforced in services
Performance isolation	One tenant can't starve others	Per-tenant connection pools, priority queues
Configuration isolation	Tenants have different API access	Per-tenant plugin configuration

multi-tenant-gateway.yamlyaml

# Multi-tenant configuration
tenants:
  - id: tenant_acme
    plan: enterprise
    config:
      rate_limit:
        minute: 50000
        burst: 1000
      allowed_endpoints:
        - /api/v1/*
        - /api/v2/*
      custom_domain: api.acme.com
      dedicated_upstream: true  # Isolated backend pool
      priority: high

  - id: tenant_startup
    plan: free
    config:
      rate_limit:
        minute: 1000
        burst: 50
      allowed_endpoints:
        - /api/v1/*
      custom_domain: null  # Uses shared domain
      dedicated_upstream: false  # Shared backend pool
      priority: low

# Gateway extracts tenant from API key → applies tenant config
# Injects X-Tenant-ID header for downstream isolation

Noisy Neighbor Prevention

The "noisy neighbor" problem: one tenant sends a traffic spike that degrades performance for all tenants. Prevention: (1) Per-tenant rate limits (hard ceiling). (2) Per-tenant connection pools to backends (one tenant can't exhaust all connections). (3) Priority queuing (enterprise tenants processed first during overload). (4) Dedicated infrastructure for large tenants (separate upstream pools).

Developer Portal & Plans

A developer portal is the self-service interface for API consumers — documentation, key management, usage dashboards, and plan selection. The gateway enforces the plans; the portal lets developers manage them.

Portal Feature	Description	Gateway Integration
API Documentation	Interactive docs (OpenAPI/Swagger)	Auto-generated from gateway routes
Key Management	Create, rotate, revoke API keys	Keys stored in gateway's consumer DB
Usage Dashboard	Request counts, error rates, latency	Metrics from gateway's access logs
Plan Selection	Free, Pro, Enterprise tiers	Gateway enforces plan-specific limits
Sandbox/Testing	Test API calls without production impact	Gateway routes sandbox keys to test env
Webhook Management	Configure event subscriptions	Gateway delivers events to registered URLs

api-plans.jsonjson

{
  "plans": [
    {
      "name": "Free",
      "price": 0,
      "limits": {
        "requests_per_minute": 100,
        "requests_per_day": 10000,
        "max_payload_size_kb": 256,
        "rate_limit_burst": 20,
        "endpoints": ["read-only"],
        "support": "community"
      }
    },
    {
      "name": "Pro",
      "price": 49,
      "limits": {
        "requests_per_minute": 5000,
        "requests_per_day": 1000000,
        "max_payload_size_kb": 5120,
        "rate_limit_burst": 500,
        "endpoints": ["all"],
        "support": "email",
        "sla": "99.9%"
      }
    },
    {
      "name": "Enterprise",
      "price": "custom",
      "limits": {
        "requests_per_minute": "custom",
        "requests_per_day": "unlimited",
        "max_payload_size_kb": 51200,
        "rate_limit_burst": "custom",
        "endpoints": ["all"],
        "support": "dedicated",
        "sla": "99.99%",
        "dedicated_infrastructure": true
      }
    }
  ]
}

Monetization Patterns

✅Tiered plans — fixed price for a request quota (most common)
✅Pay-per-use — charge per request or per compute unit
✅Freemium — free tier with limits, paid for higher limits
✅Overage billing — allow exceeding limits, charge extra per request
✅Feature gating — different plans unlock different endpoints

Interview Questions

Q:When would you use the BFF pattern instead of a single API Gateway?

A: Use BFF when: (1) Different clients need fundamentally different response shapes (mobile needs compact, web needs rich). (2) Different clients have different performance requirements (mobile: minimize round trips; web: support real-time). (3) Different teams own different client experiences and need independent deployment. A single gateway works when all clients consume the same API shape. BFF adds complexity (multiple gateways to maintain) but eliminates the 'one size fits none' problem.

Q:How do you handle partial failures in API composition?

A: Strategy: (1) Call services in parallel with individual timeouts. (2) If a non-critical service fails (recommendations), return null for that section and include metadata about the failure. (3) If a critical service fails (user auth), fail the entire request. (4) Define criticality per composition: for a dashboard, user data is critical but notifications are optional. (5) Consider serving cached/stale data for failed non-critical services. The client renders what's available and shows degraded UI for missing parts.

Q:What's the difference between a centralized gateway and a service mesh sidecar?

A: Centralized gateway: shared cluster handling north-south (external) traffic — API management, client auth, rate limiting, versioning. Sidecar (service mesh): per-pod proxy handling east-west (internal) traffic — mTLS between services, internal retries, circuit breaking, observability. They solve different problems: the gateway is client-facing infrastructure; the mesh is internal infrastructure. Most production systems use both together.

Q:How would you prevent the noisy neighbor problem in a multi-tenant API?

A: Layered isolation: (1) Per-tenant rate limits — hard ceiling regardless of other tenants' usage. (2) Per-tenant connection pools — one tenant can't exhaust backend connections. (3) Priority queuing — during overload, process high-tier tenants first. (4) Request costing — expensive operations (search, export) consume more of the quota. (5) For enterprise tenants: dedicated upstream pools (complete isolation). (6) Monitoring: alert when any tenant consistently hits limits — they may need an upgrade or there's abuse.

Q:When is request hedging appropriate and what are the risks?

A: Appropriate when: (1) Tail latency matters more than throughput (user-facing search, real-time bidding). (2) The operation is idempotent and read-only. (3) Backend has spare capacity to handle 2-3x load. Risks: (1) Doubles/triples backend load — can cause cascading overload. (2) Only safe for reads — hedging writes creates duplicates. (3) Wastes resources when the primary would have responded quickly. Mitigation: only hedge after a delay (e.g., hedge after 100ms if primary hasn't responded), not immediately.

Common Mistakes

⚠️

Gateway becomes a monolithic orchestrator

The gateway calls 10 services sequentially, applies business logic to merge results, and becomes the most complex and fragile component in the system.

✅Keep gateway composition simple: parallel calls, merge responses, handle partial failure. If composition requires business logic (conditional calls, data transformation based on domain rules), create a dedicated composition/BFF service behind the gateway. The gateway should be dumb infrastructure, not smart business logic.

⚠️

No tenant isolation in rate limiting

All tenants share a global rate limit pool — one tenant's traffic spike exhausts the limit for everyone.

✅Implement per-tenant rate limit counters. Each tenant gets their own bucket (based on plan tier). A tenant hitting their limit only affects themselves. Use separate Redis keys per tenant: ratelimit:{tenant_id}:{endpoint}.

⚠️

BFF per team instead of per client type

Creating a BFF for each backend team (orders-bff, users-bff, payments-bff) instead of per client type (mobile-bff, web-bff).

✅BFF should be organized by client type, not by backend domain. A mobile-bff aggregates data from multiple backend services into mobile-optimized responses. A per-team BFF is just another microservice with 'BFF' in the name — it doesn't solve the client-specific optimization problem.

⚠️

Services directly accessible bypassing the gateway

Backend services are reachable on public IPs or without network restrictions — attackers can bypass all gateway security (auth, rate limiting).

✅Enforce network isolation: services only accept traffic from the gateway's IP range (security groups, NetworkPolicy). Or use mTLS where services verify the caller's certificate is the gateway. If any service is directly accessible, all gateway security is theater.