BFFAPI CompositionSidecarCentralizedDev PortalMulti-TenantPlans

Gateway Patterns & Multi-Tenancy

How the gateway is used in practice — BFF pattern, API composition, sidecar vs centralized, developer portals, and multi-tenant API management.

40 min read9 sections
01

Backend for Frontend (BFF)

The BFF pattern creates a separate gateway (or gateway layer) for each client type — mobile, web, partner API. Each BFF is tailored to its client's specific needs: different response shapes, different aggregation logic, different performance characteristics.

ClientBFF OptimizationsExample
Mobile appSmaller payloads, fewer round trips, offline-friendlyAggregate user + orders in one call
Web SPARich responses, pagination, real-time updatesFull product details with reviews
Partner APIStable contract, webhook support, batch operationsBulk order creation endpoint
Internal toolsAdmin-level access, verbose responses, debug infoFull audit trail in responses
🍽️

The Restaurant with Multiple Menus

A single generic gateway is like a restaurant with one menu for everyone — the kids get confused by the wine list, and adults don't want the coloring page. BFF is like having a kids' menu, an adult menu, and a catering menu. Same kitchen (backend services), but each menu (BFF) is designed for its audience. The mobile BFF sends compact responses; the web BFF sends rich, nested data.

bff-gateway-architecture.yamlyaml
# Multiple BFF gatewayseach tailored to its client
gateways:
  mobile-bff:
    host: mobile-api.example.com
    optimizations:
      - Aggregate multiple service calls into single response
      - Compress images to mobile-friendly sizes
      - Return minimal fields (id, name, thumbnail)
      - Support offline sync endpoints
    routes:
      - path: /home-feed
        aggregates: [user-service, order-service, recommendation-service]

  web-bff:
    host: api.example.com
    optimizations:
      - Rich responses with nested relationships
      - Server-sent events for real-time updates
      - Pagination with cursor-based navigation
    routes:
      - path: /products/:id
        aggregates: [product-service, review-service, inventory-service]

  partner-bff:
    host: partner-api.example.com
    optimizations:
      - Stable versioned contract (v1 only)
      - Webhook delivery for async events
      - Batch endpoints for bulk operations
      - Strict rate limiting per partner

When One Gateway Serves All Poorly

A single generic gateway forces compromises: mobile clients download fields they don't need, web clients make multiple round trips for data that could be aggregated, and partner APIs get response shapes designed for your internal frontend. BFF eliminates these compromises — each client gets exactly what it needs. The cost: maintaining multiple gateway layers.

02

API Composition / Aggregation

API composition lets the gateway call multiple backend services and merge their responses into a single response for the client. This reduces client-side round trips and simplifies frontend logic.

PatternDescriptionLatency Impact
SequentialCall service A, then use result to call service BSum of all call latencies
ParallelCall services A, B, C simultaneously, merge resultsMax of all call latencies
Partial failureReturn available data, mark failed parts as nullMax of successful calls
FallbackIf primary fails, call fallback servicePrimary latency + fallback if needed
composition-example.jsonjson
// Client calls: GET /api/dashboard
// Gateway composes from 3 services in parallel:

// 1. user-service: GET /users/123
// 2. order-service: GET /users/123/orders?limit=5
// 3. notification-service: GET /users/123/notifications?unread=true

// Gateway merges into single response:
{
  "user": {
    "id": "123",
    "name": "Alice",
    "plan": "pro"
  },
  "recent_orders": [
    {"id": "ord_1", "total": 49.99, "status": "delivered"},
    {"id": "ord_2", "total": 29.99, "status": "shipped"}
  ],
  "notifications": {
    "unread_count": 3,
    "items": [...]
  },
  "_meta": {
    "services_called": 3,
    "services_succeeded": 3,
    "total_latency_ms": 145
  }
}

Partial Failure Handling

When composing from multiple services, some may fail while others succeed. Don't fail the entire request because one service is down. Return available data with null/empty for failed parts and include metadata about which services failed. The client can render what's available and show a degraded experience for the missing parts.

When to Use API Composition

  • Dashboard pages that need data from 3+ services
  • Mobile clients where round-trip latency is expensive
  • Reducing N+1 API calls from the frontend
  • Aggregating data that doesn't change together (user + orders + notifications)

When NOT to Use API Composition

  • When composition requires business logic (use a dedicated service instead)
  • When the aggregated data must be transactionally consistent
  • When composition adds significant latency (sequential dependent calls)
  • When it makes the gateway a single point of coupling to all services
03

Request Hedging

Request hedging sends the same request to multiple upstream instances simultaneously and returns the first successful response. The other in-flight requests are cancelled. This trades extra backend load for lower tail latency.

AspectRetryHedging
TimingSend second request AFTER first fails/times outSend multiple requests simultaneously
Latency impactAdds full retry latency on failureReturns fastest response (p99 → p50)
Backend load1x normal, 2x on failure2-3x always (every request is duplicated)
Use caseHandling failuresReducing tail latency for critical paths
IdempotencyRequired for safetyRequired — multiple instances process same request
envoy-hedging-config.yamlyaml
# Envoy request hedgingsend to 2 instances, return fastest
routes:
  - match:
      prefix: "/api/search"
    route:
      cluster: search-service
      hedge_policy:
        # Send hedge request after 100ms if primary hasn't responded
        hedge_on_per_try_timeout: true
        initial_requests: 1
        additional_request_chance:
          numerator: 100  # Always hedge (100%)
          denominator: HUNDRED

      # Alternative: immediate hedging (send to 2 simultaneously)
      # Useful for latency-critical reads
      retry_policy:
        retry_on: "5xx,reset,connect-failure"
        num_retries: 1
        per_try_timeout: 200ms  # Hedge after 200ms

Only Hedge Reads

Hedging is only safe for idempotent, read-only operations. Hedging a write request means multiple instances process the same mutation — creating duplicates. Use hedging for: search queries, product lookups, user profile reads. Never hedge: order creation, payment processing, any state-changing operation.

04

Sidecar vs Centralized Gateway

The gateway can be deployed as a centralized cluster (all traffic flows through shared instances) or as a sidecar (one gateway instance per service). Each model has different trade-offs for latency, isolation, and operational complexity.

AspectCentralized GatewaySidecar (per-service)
DeploymentShared cluster of gateway instancesOne proxy per service pod
Network hopExtra hop through gateway clusterLocalhost — no network hop
Failure blast radiusGateway down = everything downSidecar down = one service down
ConfigurationCentral config for all routesPer-service config (distributed)
Resource usageEfficient — shared poolHigher — one proxy per pod
Typical toolKong, AWS API GatewayEnvoy sidecar, Istio, Linkerd
Best forNorth-south (external) trafficEast-west (internal) traffic
hybrid-architecture.yamlyaml
# Hybrid: Centralized gateway for north-south + service mesh for east-west

# External traffic path:
# ClientCentralized API GatewayService A
#
# Internal traffic path:
# Service AEnvoy SidecarEnvoy SidecarService B

infrastructure:
  api_gateway:
    type: centralized
    tool: Kong
    handles:
      - External client authentication
      - Rate limiting per API key
      - Request routing to services
      - API versioning and deprecation
      - Developer portal

  service_mesh:
    type: sidecar
    tool: Istio (Envoy sidecars)
    handles:
      - Service-to-service mTLS
      - Internal load balancing
      - Circuit breaking between services
      - Internal observability (traces, metrics)
      - Retry policies for internal calls

The Hybrid is the Standard

Most production architectures use both: a centralized gateway for external traffic (API management, auth, rate limiting) and a service mesh for internal traffic (mTLS, retries, observability). They're complementary, not competing. The gateway handles client-facing concerns; the mesh handles service-to-service concerns.

05

Gateway Offloading

Gateway offloading moves cross-cutting concerns from individual services to the gateway. Services become simpler — they trust the gateway to have already handled auth, rate limiting, CORS, and logging.

ConcernWithout GatewayWith Gateway Offloading
AuthenticationEvery service validates JWTGateway validates, services trust X-User-ID
Rate limitingEach service implements its ownGateway enforces globally
CORSEach service sets CORS headersGateway handles all CORS
LoggingEach service logs accessGateway logs all requests centrally
TLSEach service manages certificatesGateway terminates TLS once
CompressionEach service compresses responsesGateway compresses all responses

What to Offload to the Gateway

  • Authentication and coarse authorization
  • Rate limiting and throttling
  • TLS termination and certificate management
  • CORS header management
  • Request/response compression
  • Access logging and metrics collection
  • Security headers (HSTS, CSP, X-Frame-Options)
  • Request ID generation and propagation

What to Keep in Services

  • Fine-grained authorization (resource-level access control)
  • Business logic and domain validation
  • Data transformation specific to the domain
  • Service-specific rate limits (e.g., expensive operations)
  • Database access and caching of domain data

The Bypass Risk

Gateway offloading only works if services are NEVER directly accessible from outside. If a service can be reached without going through the gateway, all offloaded security (auth, rate limiting) is bypassed. Enforce network isolation: services only accept traffic from the gateway's IP range or use mTLS to verify the caller is the gateway.

06

Multi-Tenant API Management

In a multi-tenant system, the gateway must isolate tenants — ensuring one tenant's traffic doesn't affect another's performance or access another's data.

Isolation LevelDescriptionImplementation
Rate limit isolationEach tenant has independent limitsPer-tenant rate limit counters in Redis
Routing isolationTenants route to different backendsTenant-specific upstream pools
Data isolationTenants can't access each other's dataX-Tenant-ID header, enforced in services
Performance isolationOne tenant can't starve othersPer-tenant connection pools, priority queues
Configuration isolationTenants have different API accessPer-tenant plugin configuration
multi-tenant-gateway.yamlyaml
# Multi-tenant configuration
tenants:
  - id: tenant_acme
    plan: enterprise
    config:
      rate_limit:
        minute: 50000
        burst: 1000
      allowed_endpoints:
        - /api/v1/*
        - /api/v2/*
      custom_domain: api.acme.com
      dedicated_upstream: true  # Isolated backend pool
      priority: high

  - id: tenant_startup
    plan: free
    config:
      rate_limit:
        minute: 1000
        burst: 50
      allowed_endpoints:
        - /api/v1/*
      custom_domain: null  # Uses shared domain
      dedicated_upstream: false  # Shared backend pool
      priority: low

# Gateway extracts tenant from API key → applies tenant config
# Injects X-Tenant-ID header for downstream isolation

Noisy Neighbor Prevention

The "noisy neighbor" problem: one tenant sends a traffic spike that degrades performance for all tenants. Prevention: (1) Per-tenant rate limits (hard ceiling). (2) Per-tenant connection pools to backends (one tenant can't exhaust all connections). (3) Priority queuing (enterprise tenants processed first during overload). (4) Dedicated infrastructure for large tenants (separate upstream pools).

07

Developer Portal & Plans

A developer portal is the self-service interface for API consumers — documentation, key management, usage dashboards, and plan selection. The gateway enforces the plans; the portal lets developers manage them.

Portal FeatureDescriptionGateway Integration
API DocumentationInteractive docs (OpenAPI/Swagger)Auto-generated from gateway routes
Key ManagementCreate, rotate, revoke API keysKeys stored in gateway's consumer DB
Usage DashboardRequest counts, error rates, latencyMetrics from gateway's access logs
Plan SelectionFree, Pro, Enterprise tiersGateway enforces plan-specific limits
Sandbox/TestingTest API calls without production impactGateway routes sandbox keys to test env
Webhook ManagementConfigure event subscriptionsGateway delivers events to registered URLs
api-plans.jsonjson
{
  "plans": [
    {
      "name": "Free",
      "price": 0,
      "limits": {
        "requests_per_minute": 100,
        "requests_per_day": 10000,
        "max_payload_size_kb": 256,
        "rate_limit_burst": 20,
        "endpoints": ["read-only"],
        "support": "community"
      }
    },
    {
      "name": "Pro",
      "price": 49,
      "limits": {
        "requests_per_minute": 5000,
        "requests_per_day": 1000000,
        "max_payload_size_kb": 5120,
        "rate_limit_burst": 500,
        "endpoints": ["all"],
        "support": "email",
        "sla": "99.9%"
      }
    },
    {
      "name": "Enterprise",
      "price": "custom",
      "limits": {
        "requests_per_minute": "custom",
        "requests_per_day": "unlimited",
        "max_payload_size_kb": 51200,
        "rate_limit_burst": "custom",
        "endpoints": ["all"],
        "support": "dedicated",
        "sla": "99.99%",
        "dedicated_infrastructure": true
      }
    }
  ]
}

Monetization Patterns

  • Tiered plans — fixed price for a request quota (most common)
  • Pay-per-use — charge per request or per compute unit
  • Freemium — free tier with limits, paid for higher limits
  • Overage billing — allow exceeding limits, charge extra per request
  • Feature gating — different plans unlock different endpoints
08

Interview Questions

Q:When would you use the BFF pattern instead of a single API Gateway?

A: Use BFF when: (1) Different clients need fundamentally different response shapes (mobile needs compact, web needs rich). (2) Different clients have different performance requirements (mobile: minimize round trips; web: support real-time). (3) Different teams own different client experiences and need independent deployment. A single gateway works when all clients consume the same API shape. BFF adds complexity (multiple gateways to maintain) but eliminates the 'one size fits none' problem.

Q:How do you handle partial failures in API composition?

A: Strategy: (1) Call services in parallel with individual timeouts. (2) If a non-critical service fails (recommendations), return null for that section and include metadata about the failure. (3) If a critical service fails (user auth), fail the entire request. (4) Define criticality per composition: for a dashboard, user data is critical but notifications are optional. (5) Consider serving cached/stale data for failed non-critical services. The client renders what's available and shows degraded UI for missing parts.

Q:What's the difference between a centralized gateway and a service mesh sidecar?

A: Centralized gateway: shared cluster handling north-south (external) traffic — API management, client auth, rate limiting, versioning. Sidecar (service mesh): per-pod proxy handling east-west (internal) traffic — mTLS between services, internal retries, circuit breaking, observability. They solve different problems: the gateway is client-facing infrastructure; the mesh is internal infrastructure. Most production systems use both together.

Q:How would you prevent the noisy neighbor problem in a multi-tenant API?

A: Layered isolation: (1) Per-tenant rate limits — hard ceiling regardless of other tenants' usage. (2) Per-tenant connection pools — one tenant can't exhaust backend connections. (3) Priority queuing — during overload, process high-tier tenants first. (4) Request costing — expensive operations (search, export) consume more of the quota. (5) For enterprise tenants: dedicated upstream pools (complete isolation). (6) Monitoring: alert when any tenant consistently hits limits — they may need an upgrade or there's abuse.

Q:When is request hedging appropriate and what are the risks?

A: Appropriate when: (1) Tail latency matters more than throughput (user-facing search, real-time bidding). (2) The operation is idempotent and read-only. (3) Backend has spare capacity to handle 2-3x load. Risks: (1) Doubles/triples backend load — can cause cascading overload. (2) Only safe for reads — hedging writes creates duplicates. (3) Wastes resources when the primary would have responded quickly. Mitigation: only hedge after a delay (e.g., hedge after 100ms if primary hasn't responded), not immediately.

09

Common Mistakes

⚠️

Gateway becomes a monolithic orchestrator

The gateway calls 10 services sequentially, applies business logic to merge results, and becomes the most complex and fragile component in the system.

Keep gateway composition simple: parallel calls, merge responses, handle partial failure. If composition requires business logic (conditional calls, data transformation based on domain rules), create a dedicated composition/BFF service behind the gateway. The gateway should be dumb infrastructure, not smart business logic.

⚠️

No tenant isolation in rate limiting

All tenants share a global rate limit pool — one tenant's traffic spike exhausts the limit for everyone.

Implement per-tenant rate limit counters. Each tenant gets their own bucket (based on plan tier). A tenant hitting their limit only affects themselves. Use separate Redis keys per tenant: ratelimit:{tenant_id}:{endpoint}.

⚠️

BFF per team instead of per client type

Creating a BFF for each backend team (orders-bff, users-bff, payments-bff) instead of per client type (mobile-bff, web-bff).

BFF should be organized by client type, not by backend domain. A mobile-bff aggregates data from multiple backend services into mobile-optimized responses. A per-team BFF is just another microservice with 'BFF' in the name — it doesn't solve the client-specific optimization problem.

⚠️

Services directly accessible bypassing the gateway

Backend services are reachable on public IPs or without network restrictions — attackers can bypass all gateway security (auth, rate limiting).

Enforce network isolation: services only accept traffic from the gateway's IP range (security groups, NetworkPolicy). Or use mTLS where services verify the caller's certificate is the gateway. If any service is directly accessible, all gateway security is theater.