Gateway Patterns & Multi-Tenancy
How the gateway is used in practice — BFF pattern, API composition, sidecar vs centralized, developer portals, and multi-tenant API management.
Table of Contents
Backend for Frontend (BFF)
The BFF pattern creates a separate gateway (or gateway layer) for each client type — mobile, web, partner API. Each BFF is tailored to its client's specific needs: different response shapes, different aggregation logic, different performance characteristics.
| Client | BFF Optimizations | Example |
|---|---|---|
| Mobile app | Smaller payloads, fewer round trips, offline-friendly | Aggregate user + orders in one call |
| Web SPA | Rich responses, pagination, real-time updates | Full product details with reviews |
| Partner API | Stable contract, webhook support, batch operations | Bulk order creation endpoint |
| Internal tools | Admin-level access, verbose responses, debug info | Full audit trail in responses |
The Restaurant with Multiple Menus
A single generic gateway is like a restaurant with one menu for everyone — the kids get confused by the wine list, and adults don't want the coloring page. BFF is like having a kids' menu, an adult menu, and a catering menu. Same kitchen (backend services), but each menu (BFF) is designed for its audience. The mobile BFF sends compact responses; the web BFF sends rich, nested data.
# Multiple BFF gateways — each tailored to its client gateways: mobile-bff: host: mobile-api.example.com optimizations: - Aggregate multiple service calls into single response - Compress images to mobile-friendly sizes - Return minimal fields (id, name, thumbnail) - Support offline sync endpoints routes: - path: /home-feed aggregates: [user-service, order-service, recommendation-service] web-bff: host: api.example.com optimizations: - Rich responses with nested relationships - Server-sent events for real-time updates - Pagination with cursor-based navigation routes: - path: /products/:id aggregates: [product-service, review-service, inventory-service] partner-bff: host: partner-api.example.com optimizations: - Stable versioned contract (v1 only) - Webhook delivery for async events - Batch endpoints for bulk operations - Strict rate limiting per partner
When One Gateway Serves All Poorly
A single generic gateway forces compromises: mobile clients download fields they don't need, web clients make multiple round trips for data that could be aggregated, and partner APIs get response shapes designed for your internal frontend. BFF eliminates these compromises — each client gets exactly what it needs. The cost: maintaining multiple gateway layers.
API Composition / Aggregation
API composition lets the gateway call multiple backend services and merge their responses into a single response for the client. This reduces client-side round trips and simplifies frontend logic.
| Pattern | Description | Latency Impact |
|---|---|---|
| Sequential | Call service A, then use result to call service B | Sum of all call latencies |
| Parallel | Call services A, B, C simultaneously, merge results | Max of all call latencies |
| Partial failure | Return available data, mark failed parts as null | Max of successful calls |
| Fallback | If primary fails, call fallback service | Primary latency + fallback if needed |
// Client calls: GET /api/dashboard // Gateway composes from 3 services in parallel: // 1. user-service: GET /users/123 // 2. order-service: GET /users/123/orders?limit=5 // 3. notification-service: GET /users/123/notifications?unread=true // Gateway merges into single response: { "user": { "id": "123", "name": "Alice", "plan": "pro" }, "recent_orders": [ {"id": "ord_1", "total": 49.99, "status": "delivered"}, {"id": "ord_2", "total": 29.99, "status": "shipped"} ], "notifications": { "unread_count": 3, "items": [...] }, "_meta": { "services_called": 3, "services_succeeded": 3, "total_latency_ms": 145 } }
Partial Failure Handling
When composing from multiple services, some may fail while others succeed. Don't fail the entire request because one service is down. Return available data with null/empty for failed parts and include metadata about which services failed. The client can render what's available and show a degraded experience for the missing parts.
When to Use API Composition
- ✅Dashboard pages that need data from 3+ services
- ✅Mobile clients where round-trip latency is expensive
- ✅Reducing N+1 API calls from the frontend
- ✅Aggregating data that doesn't change together (user + orders + notifications)
When NOT to Use API Composition
- ❌When composition requires business logic (use a dedicated service instead)
- ❌When the aggregated data must be transactionally consistent
- ❌When composition adds significant latency (sequential dependent calls)
- ❌When it makes the gateway a single point of coupling to all services
Request Hedging
Request hedging sends the same request to multiple upstream instances simultaneously and returns the first successful response. The other in-flight requests are cancelled. This trades extra backend load for lower tail latency.
| Aspect | Retry | Hedging |
|---|---|---|
| Timing | Send second request AFTER first fails/times out | Send multiple requests simultaneously |
| Latency impact | Adds full retry latency on failure | Returns fastest response (p99 → p50) |
| Backend load | 1x normal, 2x on failure | 2-3x always (every request is duplicated) |
| Use case | Handling failures | Reducing tail latency for critical paths |
| Idempotency | Required for safety | Required — multiple instances process same request |
# Envoy request hedging — send to 2 instances, return fastest routes: - match: prefix: "/api/search" route: cluster: search-service hedge_policy: # Send hedge request after 100ms if primary hasn't responded hedge_on_per_try_timeout: true initial_requests: 1 additional_request_chance: numerator: 100 # Always hedge (100%) denominator: HUNDRED # Alternative: immediate hedging (send to 2 simultaneously) # Useful for latency-critical reads retry_policy: retry_on: "5xx,reset,connect-failure" num_retries: 1 per_try_timeout: 200ms # Hedge after 200ms
Only Hedge Reads
Hedging is only safe for idempotent, read-only operations. Hedging a write request means multiple instances process the same mutation — creating duplicates. Use hedging for: search queries, product lookups, user profile reads. Never hedge: order creation, payment processing, any state-changing operation.
Sidecar vs Centralized Gateway
The gateway can be deployed as a centralized cluster (all traffic flows through shared instances) or as a sidecar (one gateway instance per service). Each model has different trade-offs for latency, isolation, and operational complexity.
| Aspect | Centralized Gateway | Sidecar (per-service) |
|---|---|---|
| Deployment | Shared cluster of gateway instances | One proxy per service pod |
| Network hop | Extra hop through gateway cluster | Localhost — no network hop |
| Failure blast radius | Gateway down = everything down | Sidecar down = one service down |
| Configuration | Central config for all routes | Per-service config (distributed) |
| Resource usage | Efficient — shared pool | Higher — one proxy per pod |
| Typical tool | Kong, AWS API Gateway | Envoy sidecar, Istio, Linkerd |
| Best for | North-south (external) traffic | East-west (internal) traffic |
# Hybrid: Centralized gateway for north-south + service mesh for east-west # External traffic path: # Client → Centralized API Gateway → Service A # # Internal traffic path: # Service A → Envoy Sidecar → Envoy Sidecar → Service B infrastructure: api_gateway: type: centralized tool: Kong handles: - External client authentication - Rate limiting per API key - Request routing to services - API versioning and deprecation - Developer portal service_mesh: type: sidecar tool: Istio (Envoy sidecars) handles: - Service-to-service mTLS - Internal load balancing - Circuit breaking between services - Internal observability (traces, metrics) - Retry policies for internal calls
The Hybrid is the Standard
Most production architectures use both: a centralized gateway for external traffic (API management, auth, rate limiting) and a service mesh for internal traffic (mTLS, retries, observability). They're complementary, not competing. The gateway handles client-facing concerns; the mesh handles service-to-service concerns.
Gateway Offloading
Gateway offloading moves cross-cutting concerns from individual services to the gateway. Services become simpler — they trust the gateway to have already handled auth, rate limiting, CORS, and logging.
| Concern | Without Gateway | With Gateway Offloading |
|---|---|---|
| Authentication | Every service validates JWT | Gateway validates, services trust X-User-ID |
| Rate limiting | Each service implements its own | Gateway enforces globally |
| CORS | Each service sets CORS headers | Gateway handles all CORS |
| Logging | Each service logs access | Gateway logs all requests centrally |
| TLS | Each service manages certificates | Gateway terminates TLS once |
| Compression | Each service compresses responses | Gateway compresses all responses |
What to Offload to the Gateway
- ✅Authentication and coarse authorization
- ✅Rate limiting and throttling
- ✅TLS termination and certificate management
- ✅CORS header management
- ✅Request/response compression
- ✅Access logging and metrics collection
- ✅Security headers (HSTS, CSP, X-Frame-Options)
- ✅Request ID generation and propagation
What to Keep in Services
- ✅Fine-grained authorization (resource-level access control)
- ✅Business logic and domain validation
- ✅Data transformation specific to the domain
- ✅Service-specific rate limits (e.g., expensive operations)
- ✅Database access and caching of domain data
The Bypass Risk
Gateway offloading only works if services are NEVER directly accessible from outside. If a service can be reached without going through the gateway, all offloaded security (auth, rate limiting) is bypassed. Enforce network isolation: services only accept traffic from the gateway's IP range or use mTLS to verify the caller is the gateway.
Multi-Tenant API Management
In a multi-tenant system, the gateway must isolate tenants — ensuring one tenant's traffic doesn't affect another's performance or access another's data.
| Isolation Level | Description | Implementation |
|---|---|---|
| Rate limit isolation | Each tenant has independent limits | Per-tenant rate limit counters in Redis |
| Routing isolation | Tenants route to different backends | Tenant-specific upstream pools |
| Data isolation | Tenants can't access each other's data | X-Tenant-ID header, enforced in services |
| Performance isolation | One tenant can't starve others | Per-tenant connection pools, priority queues |
| Configuration isolation | Tenants have different API access | Per-tenant plugin configuration |
# Multi-tenant configuration tenants: - id: tenant_acme plan: enterprise config: rate_limit: minute: 50000 burst: 1000 allowed_endpoints: - /api/v1/* - /api/v2/* custom_domain: api.acme.com dedicated_upstream: true # Isolated backend pool priority: high - id: tenant_startup plan: free config: rate_limit: minute: 1000 burst: 50 allowed_endpoints: - /api/v1/* custom_domain: null # Uses shared domain dedicated_upstream: false # Shared backend pool priority: low # Gateway extracts tenant from API key → applies tenant config # Injects X-Tenant-ID header for downstream isolation
Noisy Neighbor Prevention
The "noisy neighbor" problem: one tenant sends a traffic spike that degrades performance for all tenants. Prevention: (1) Per-tenant rate limits (hard ceiling). (2) Per-tenant connection pools to backends (one tenant can't exhaust all connections). (3) Priority queuing (enterprise tenants processed first during overload). (4) Dedicated infrastructure for large tenants (separate upstream pools).
Developer Portal & Plans
A developer portal is the self-service interface for API consumers — documentation, key management, usage dashboards, and plan selection. The gateway enforces the plans; the portal lets developers manage them.
| Portal Feature | Description | Gateway Integration |
|---|---|---|
| API Documentation | Interactive docs (OpenAPI/Swagger) | Auto-generated from gateway routes |
| Key Management | Create, rotate, revoke API keys | Keys stored in gateway's consumer DB |
| Usage Dashboard | Request counts, error rates, latency | Metrics from gateway's access logs |
| Plan Selection | Free, Pro, Enterprise tiers | Gateway enforces plan-specific limits |
| Sandbox/Testing | Test API calls without production impact | Gateway routes sandbox keys to test env |
| Webhook Management | Configure event subscriptions | Gateway delivers events to registered URLs |
{ "plans": [ { "name": "Free", "price": 0, "limits": { "requests_per_minute": 100, "requests_per_day": 10000, "max_payload_size_kb": 256, "rate_limit_burst": 20, "endpoints": ["read-only"], "support": "community" } }, { "name": "Pro", "price": 49, "limits": { "requests_per_minute": 5000, "requests_per_day": 1000000, "max_payload_size_kb": 5120, "rate_limit_burst": 500, "endpoints": ["all"], "support": "email", "sla": "99.9%" } }, { "name": "Enterprise", "price": "custom", "limits": { "requests_per_minute": "custom", "requests_per_day": "unlimited", "max_payload_size_kb": 51200, "rate_limit_burst": "custom", "endpoints": ["all"], "support": "dedicated", "sla": "99.99%", "dedicated_infrastructure": true } } ] }
Monetization Patterns
- ✅Tiered plans — fixed price for a request quota (most common)
- ✅Pay-per-use — charge per request or per compute unit
- ✅Freemium — free tier with limits, paid for higher limits
- ✅Overage billing — allow exceeding limits, charge extra per request
- ✅Feature gating — different plans unlock different endpoints
Interview Questions
Q:When would you use the BFF pattern instead of a single API Gateway?
A: Use BFF when: (1) Different clients need fundamentally different response shapes (mobile needs compact, web needs rich). (2) Different clients have different performance requirements (mobile: minimize round trips; web: support real-time). (3) Different teams own different client experiences and need independent deployment. A single gateway works when all clients consume the same API shape. BFF adds complexity (multiple gateways to maintain) but eliminates the 'one size fits none' problem.
Q:How do you handle partial failures in API composition?
A: Strategy: (1) Call services in parallel with individual timeouts. (2) If a non-critical service fails (recommendations), return null for that section and include metadata about the failure. (3) If a critical service fails (user auth), fail the entire request. (4) Define criticality per composition: for a dashboard, user data is critical but notifications are optional. (5) Consider serving cached/stale data for failed non-critical services. The client renders what's available and shows degraded UI for missing parts.
Q:What's the difference between a centralized gateway and a service mesh sidecar?
A: Centralized gateway: shared cluster handling north-south (external) traffic — API management, client auth, rate limiting, versioning. Sidecar (service mesh): per-pod proxy handling east-west (internal) traffic — mTLS between services, internal retries, circuit breaking, observability. They solve different problems: the gateway is client-facing infrastructure; the mesh is internal infrastructure. Most production systems use both together.
Q:How would you prevent the noisy neighbor problem in a multi-tenant API?
A: Layered isolation: (1) Per-tenant rate limits — hard ceiling regardless of other tenants' usage. (2) Per-tenant connection pools — one tenant can't exhaust backend connections. (3) Priority queuing — during overload, process high-tier tenants first. (4) Request costing — expensive operations (search, export) consume more of the quota. (5) For enterprise tenants: dedicated upstream pools (complete isolation). (6) Monitoring: alert when any tenant consistently hits limits — they may need an upgrade or there's abuse.
Q:When is request hedging appropriate and what are the risks?
A: Appropriate when: (1) Tail latency matters more than throughput (user-facing search, real-time bidding). (2) The operation is idempotent and read-only. (3) Backend has spare capacity to handle 2-3x load. Risks: (1) Doubles/triples backend load — can cause cascading overload. (2) Only safe for reads — hedging writes creates duplicates. (3) Wastes resources when the primary would have responded quickly. Mitigation: only hedge after a delay (e.g., hedge after 100ms if primary hasn't responded), not immediately.
Common Mistakes
Gateway becomes a monolithic orchestrator
The gateway calls 10 services sequentially, applies business logic to merge results, and becomes the most complex and fragile component in the system.
✅Keep gateway composition simple: parallel calls, merge responses, handle partial failure. If composition requires business logic (conditional calls, data transformation based on domain rules), create a dedicated composition/BFF service behind the gateway. The gateway should be dumb infrastructure, not smart business logic.
No tenant isolation in rate limiting
All tenants share a global rate limit pool — one tenant's traffic spike exhausts the limit for everyone.
✅Implement per-tenant rate limit counters. Each tenant gets their own bucket (based on plan tier). A tenant hitting their limit only affects themselves. Use separate Redis keys per tenant: ratelimit:{tenant_id}:{endpoint}.
BFF per team instead of per client type
Creating a BFF for each backend team (orders-bff, users-bff, payments-bff) instead of per client type (mobile-bff, web-bff).
✅BFF should be organized by client type, not by backend domain. A mobile-bff aggregates data from multiple backend services into mobile-optimized responses. A per-team BFF is just another microservice with 'BFF' in the name — it doesn't solve the client-specific optimization problem.
Services directly accessible bypassing the gateway
Backend services are reachable on public IPs or without network restrictions — attackers can bypass all gateway security (auth, rate limiting).
✅Enforce network isolation: services only accept traffic from the gateway's IP range (security groups, NetworkPolicy). Or use mTLS where services verify the caller's certificate is the gateway. If any service is directly accessible, all gateway security is theater.