The API gateway decision record: why the gateway you chose determines your authentication surface and your rate limiting model

API gateway selection looks like an infrastructure detail until a security researcher finds four unprotected admin routes that had been accepting POST requests to modify the database for four months — because the authentication model was per-route, the default in AWS API Gateway is open unless explicitly protected, and the policy that said "all state-modifying routes must have the JWT authorizer" had never been written down. The gateway you chose determines your authentication surface, your rate limiting scope for unauthenticated traffic, and your per-request cost floor at scale — and all three are set at selection time, before you have any of the production data that would make the structural constraints obvious.

A 12-person API-first SaaS startup chose AWS API Gateway in year one. The CTO found a tutorial via ChatGPT — "what API gateway should I use for a Node.js backend on AWS?" — and within an afternoon had a working REST API with Cognito JWT authentication and Lambda proxy integration. The tutorial was thorough. It covered how to create a user pool, how to attach a JWT authorizer to a route, how to wire the Lambda function. By the end of the tutorial, the first authenticated endpoint returned a 200. The session closed. Nobody asked about the pricing model at scale. Nobody asked about what the default authentication state was for a route that didn't have an authorizer attached. Nobody asked about how unauthenticated public endpoints fit into the rate limiting model. The session solved the immediate problem, and the structural constraints of the solution were never externalized into a document.

By month 18, the API had grown to 47 routes across 12 Lambda functions. A new backend developer joined and was tasked with integrating a payment webhook from Stripe. Stripe webhooks use their own HMAC signature verification — they cannot carry a Cognito JWT, so the webhook route legitimately required no authorizer. The developer correctly set authorization_type = "NONE" for the Stripe webhook path in the Terraform configuration. That same week, the developer added four admin routes for internal operations: a bulk data export endpoint, a user impersonation endpoint for the support team, an account deletion endpoint, and an audit log query endpoint. The developer built these routes by copying the Terraform resource block they had most recently written — which happened to be the Stripe webhook block with authorization_type = "NONE". All four admin routes were deployed to production with no authorizer, no API key requirement, and no IAM constraint.

The copy-paste error was not visible in code review. The Terraform plan diff showed four new aws_apigatewayv2_route resources being added, the reviewer confirmed the Lambda functions existed and had the right permissions, and the routes were merged. There was no automated check that verified authorization_type on each route. There was no policy document — no ADR, no team convention guide, no README section — that stated the rule: all state-modifying routes must have authorization_type = JWT with the Cognito authorizer ARN. The policy was implicit knowledge, derived from "look at the existing routes and copy what they do." The existing route the developer copied did not have auth. The policy failed silently.

Four months later, a security researcher running automated scanning found the admin routes. They appeared in the scanning results because they returned structured JSON error messages to GET requests — a fingerprint for an API endpoint that processes requests without authentication. The researcher sent POST requests and found the user impersonation endpoint accepted arbitrary account IDs and returned session tokens for those accounts without any credential check. The account deletion endpoint processed deletion requests for any account ID provided. The audit log query endpoint returned internal system events including IP addresses, user agent strings, and action logs for every account on the platform.

The researcher disclosed responsibly. No data was exfiltrated beyond the researcher's own test account. The routes were disabled within two hours of the disclosure email. The postmortem produced a new Terraform policy check — a custom Sentinel rule that rejected plans with authorization_type = "NONE" on routes whose paths matched the /admin/* pattern. But the more important artifact — the one that would have prevented the incident — was never written. The API gateway ADR that documented the default authentication behavior of AWS API Gateway, the policy for new route authentication, the enforcement mechanism, and the bypass surface (internal Lambda-to-Lambda invocations that skipped the gateway entirely) still did not exist after the incident.

The second incident happened independently, six weeks before the security disclosure. The API served two classes of endpoints: authenticated routes (requiring the Cognito JWT, rate-limited via REST API usage plans at 1,000 requests per API key per minute) and unauthenticated public routes (product catalog, public pricing tiers, API status page — no auth required, no API key required). The usage plan rate limits covered the authenticated routes exclusively. The public routes were outside the rate limiting boundary — no usage plan applied to a request that carried no API key.

A competitor's data extraction bot found the public product catalog endpoint and began hitting it at 80,000 requests per minute over a Saturday and Sunday. The catalog Lambda function hit its account-level Lambda concurrency ceiling at 1,000 concurrent executions — the default for all Lambda functions across the entire AWS account. Lambda throttling cascaded: the authenticated API endpoints shared the same Lambda concurrency pool. Users of the paid authenticated API began receiving 429 errors from Lambda throttling, not from their usage plan limits. Their usage plan counters were well below the rate limit. The throttling had nothing to do with their request rates. It came from a public endpoint they had no visibility into consuming all available concurrency.

The on-call engineer spent two hours diagnosing why paid users were being throttled. The API Gateway metrics dashboard showed usage plan request counts well below limits. The Lambda metrics showed concurrency exhausted on the catalog function. The connection between the catalog Lambda's concurrency exhaustion and the paid API throttling was not visible from the API Gateway layer — it required correlating API Gateway metrics with Lambda concurrency metrics separately, which was not a pre-built dashboard and had never been considered as a failure mode to monitor. Adding WAF to rate-limit the public endpoint took four additional hours on a weekend. The 5-minute WAF rate-limiting window meant the first 80,000 requests in each window always passed before the counter engaged. The bot rate was high enough that significant damage to paid-tier concurrency continued during each new 5-minute window.

Both incidents were downstream of the same upstream omission: the authentication surface — which routes have auth, what the default is, how the policy is enforced — and the rate limiting architecture — which endpoints are in scope, what the model is for unauthenticated traffic, whether Lambda concurrency boundaries separate public from authenticated traffic — were never documented. An API gateway ADR that captured these structural decisions would not have changed the vendor choice. It would have prevented both incidents.

The three structural properties that gateway selection determines

When teams evaluate API gateways, the conversation centers on managed versus self-hosted, AWS ecosystem integration, protocol support (REST, WebSocket, gRPC), and the time required to get the first authenticated endpoint running. These are real evaluation criteria. The structural properties that determine whether the gateway selection ages well — whether incidents occur, whether costs scale predictably, whether new developers can add routes correctly without tribal knowledge — are different, and they are set at selection time.

Authentication surface: default state and bypass paths

The authentication surface of an API gateway is not just the authorizer configuration on existing routes. It is the combination of three properties: what the default authentication state is for a new route (open by default versus protected by default), how auth claims are propagated to downstream services, and what endpoints exist that bypass the gateway entirely.

AWS API Gateway — both REST API and HTTP API — defaults to open. A new route has authorization_type = "NONE" unless the developer explicitly sets a JWT authorizer, Lambda authorizer, or IAM auth. This default is the opposite of a deny-by-default security model. Every new route requires an affirmative act to be protected. In a codebase where route resources are added by copy-paste from existing Terraform, the copied block's authorization type becomes the de facto default for the developer doing the copy. If the most recently modified block happens to be an exception (a public webhook), the exception propagates. There is no enforcement mechanism in API Gateway itself that rejects routes without an authorizer on specified path patterns. The enforcement requires a policy outside the gateway — a Terraform check, a code review standard, an automated scanner — and that policy must be written down to exist.

The second authentication surface issue is the bypass path. API Gateway sits in front of Lambda functions, but Lambda functions can be invoked directly — by other Lambda functions, by Step Functions, by EventBridge rules, by internal services that hold the Lambda ARN and IAM permissions. A service that invokes a Lambda function directly bypasses the API Gateway authorizer entirely. If the Lambda function's authentication logic is "the gateway validates the JWT before I'm called, so I don't need to validate it myself," then direct invocation from an internal service produces an unauthenticated call that the Lambda function accepts. The authentication guarantee is provided by the gateway, but the gateway is not in the call path for internal invocations. The bypass surface is a structural property of the gateway placement and must be documented: which Lambda functions are callable only via the gateway (and rely on gateway-level auth) versus which implement their own auth validation independent of the gateway.

This surface is distinct from the authentication strategy decision record, which covers the token type, session management model, and login flow. The gateway ADR covers the enforcement surface — where validation physically happens and what paths exist that skip validation. A team can make correct authentication strategy decisions and still have an open bypass surface if the gateway placement is not documented.

Rate limiting model: keys, scope, and unauthenticated traffic

The rate limiting model of an API gateway has three components: the key used to identify and bucket requests (IP address, API key, user ID, custom header), the scope of endpoints that the rate limiting covers, and where the rate limit state lives (in-memory per gateway instance versus distributed across instances).

AWS API Gateway REST API's usage plan rate limiting uses API keys as the rate limiting key. This model is well-suited for a B2B API where each customer has an API key and rate limits are per-customer. It is poorly suited for consumer-facing APIs where requests come from end users without API keys, or for public endpoints where no authentication is required. Usage plan limits simply do not apply to requests that carry no API key — there is no "unauthenticated request bucket" in the usage plan model. Unauthenticated traffic is unlimited from the usage plan perspective.

The consequence is that unauthenticated endpoints — public pages, public catalog endpoints, status checks, public pricing queries — are outside the rate limiting architecture that the gateway provides. Whether unauthenticated endpoints share Lambda concurrency with authenticated endpoints (and can therefore starve authenticated traffic by exhausting the concurrency pool) is a function of Lambda function organization, not of the gateway itself. If all endpoints proxy to Lambda functions in the same AWS account with the same default concurrency limit, a flooded public endpoint consumes concurrency that authenticated traffic depends on. Reserving concurrency — setting reserved_concurrent_executions on the authenticated Lambda functions — is the isolation mechanism, but it must be deliberately configured and must be documented as a requirement in the gateway ADR. Without the documentation, the next Lambda function added for a public endpoint inherits the unreserved-concurrency default and reintroduces the contamination risk.

The API rate limiting decision record covers the rate limiting strategy — the algorithm (token bucket, fixed window, sliding window), the granularity (global versus per-user versus per-endpoint), and the behavior under saturation. The gateway ADR covers the enforcement surface: which endpoints are inside the rate limiting perimeter, which are outside, and what happens to traffic that is outside. These are separate decisions that reference each other — the rate limiting strategy cannot be evaluated in isolation from the gateway that enforces it.

Cost model: per-request pricing at scale

API gateway costs compound with traffic growth in ways that are not obvious at selection time, because the selection happens at low traffic when the per-request cost is negligible. AWS API Gateway REST API costs $3.50 per million requests for the first 333 million monthly requests, then decreases at volume tiers. AWS API Gateway HTTP API costs $1.00 per million requests — 70% less for the same traffic. Teams that adopted API Gateway from tutorials published before 2020 (when HTTP API was released) often continue running REST API at the 70% cost premium without having evaluated whether HTTP API covers their feature requirements.

At 100 million monthly requests, the cost differential is $250 per month ($350 for REST API versus $100 for HTTP API). At 1 billion monthly requests, the differential is $2,500 per month or $30,000 per year. For teams that do not use the REST API-specific features — mapping templates, per-route throttle overrides, usage plans with API keys — the migration to HTTP API is straightforward. For teams that rely on usage plans for per-customer rate limiting, the migration requires adopting WAF for rate limiting, which adds its own cost and changes the rate limiting model (from per-API-key to per-IP with a 5-minute window). The decision to stay on REST API or migrate to HTTP API cannot be made intelligently without documenting what REST API-specific features are actually used, and that documentation is what the ADR provides.

If WAF is added for rate limiting or security policies (IP allowlisting, geo-blocking, bot detection rules), its cost is separate from the API Gateway cost: $5 per month per web ACL, $1 per million requests evaluated, plus per-rule charges for managed rule groups. WAF costs are frequently not included in API Gateway cost modeling at selection time because WAF is an optional add-on — it is added reactively (after an incident or a rate limiting gap is discovered) rather than proactively. The first time the team needs IP-based rate limiting on a public endpoint, they discover both the WAF product and its cost simultaneously under incident pressure. Documenting the expected WAF configuration and its cost at selection time allows the team to model the full gateway cost from the start rather than discovering it during an incident.

The options and their structural tradeoffs

AWS API Gateway REST API (v1)

REST API is the original AWS API Gateway product, launched in 2015. It is the most feature-complete API Gateway option for AWS-native architectures: usage plans and API keys for per-key rate limiting, Lambda authorizers for custom authentication logic beyond JWT validation, mapping templates for request and response transformation using the Velocity Template Language, stage variables for environment-specific configuration, canary deployments for routing a percentage of traffic to a new stage, and VPC Link for integrating with backend services in a private VPC without public internet exposure. The integration catalog supports Lambda proxy, HTTP, AWS service integrations (calling DynamoDB, SQS, Step Functions directly from the gateway), and mock integrations for testing.

The pricing model — $3.50 per million requests for the first 333 million, scaling to lower rates at higher volumes — makes REST API the more expensive option compared to HTTP API for identical traffic on equivalent features. The 29MB payload size limit accommodates most API use cases but creates a constraint for file upload or large JSON payload endpoints, which require a pre-signed S3 URL pattern to work around it. The per-route invocation overhead (approximately 10ms for REST API versus 2–3ms for HTTP API) is noticeable for latency-sensitive use cases where the gateway overhead is a significant fraction of total request latency.

REST API's usage plan model is the primary reason teams stay on it rather than migrating to HTTP API. Usage plans allow per-API-key rate limits and quota enforcement — each customer's API key is associated with a usage plan that defines the maximum requests per second and the maximum requests per month. If a specific customer exceeds their quota, their requests are rejected with 429 before reaching the Lambda function. This model is well-suited for B2B APIs with per-customer billing and contractual rate limit guarantees. It is less suited for consumer APIs where users authenticate with session tokens rather than API keys, or for public endpoints where no key is presented.

The authentication default — authorization_type = "NONE" on every new route unless explicitly set — is the structural security risk that the ADR must document as a policy with an enforcement mechanism. Teams using Terraform for gateway configuration should consider a Terraform check or Sentinel policy that requires authorization_type to be explicitly set to a non-NONE value for routes whose paths match specified patterns (admin paths, write operations, user-specific data endpoints). The check does not prevent public endpoints — it requires an explicit authorization_type = "NONE" annotation that signals a deliberate decision to leave the route unprotected, rather than an accidental default.

AWS API Gateway HTTP API (v2)

HTTP API launched in 2020 as a simpler, cheaper, lower-latency alternative to REST API. The pricing model — $1.00 per million requests for the first 300 million — is 71% cheaper than REST API. The invocation overhead is approximately 2–3ms compared to REST API's 10ms, which matters for request flows where the gateway overhead is significant relative to Lambda execution time. The native JWT authorizer validates JWTs (verifying signature, expiration, and claim values) without requiring a Lambda authorizer function, which reduces the per-request cost of authentication by eliminating the Lambda invocation for auth.

The structural limitations of HTTP API are load-bearing for some teams. There are no usage plans and no API key model — the per-API-key rate limiting mechanism that REST API provides does not exist in HTTP API. Rate limiting for HTTP API requires AWS WAF, which changes the rate limiting key from API key to IP address (or custom rule logic) and introduces the 5-minute evaluation window behavior. Teams that need per-customer rate limiting with per-customer quota enforcement — the B2B API use case — cannot implement it natively in HTTP API. Teams that need request transformation (modifying the request body or headers before forwarding to the Lambda function) must implement the transformation in the Lambda function itself, because HTTP API does not support mapping templates. The payload limit is 10MB versus REST API's 29MB, which creates constraints for endpoints that receive large payloads.

For teams that do not use API keys, mapping templates, or per-route throttle overrides — which covers most consumer-facing APIs and internal service APIs — HTTP API provides the same authentication capability (JWT authorizer), the same routing flexibility, and the same Lambda integration at 70% lower cost. The migration from REST API to HTTP API requires replacing usage plan configuration with WAF rules if IP-based rate limiting is acceptable, replacing Lambda authorizers with the native JWT authorizer if the auth logic is standard JWT validation, and removing mapping templates if they exist by moving the transformation logic into Lambda functions. The migration is not free, but the annual cost saving for high-traffic APIs is often larger than the migration engineering cost after the first year. The ADR should document whether the REST API features in use are actually required — many teams that have used REST API since before HTTP API existed have never audited which features they rely on.

Kong Gateway

Kong Gateway is an open-source API gateway built on nginx with a Lua plugin model. The plugin architecture allows authentication, rate limiting, request transformation, logging, and security policies to be applied per-route, per-consumer, or globally without modifying application code. Kong's rate-limiting plugin uses Redis to store rate limit counters, which means the rate limit is enforced across all Kong Gateway instances collectively — a distributed rate limit rather than a per-instance limit. This matters for multi-instance deployments: if three Kong instances each enforce a 1,000 req/min limit independently (without shared state), the effective cluster limit is 3,000 req/min; with Redis-backed distributed state, the cluster enforces 1,000 req/min regardless of how many Kong instances serve the traffic.

Kong's authentication surface is policy-based rather than per-route default. The JWT plugin, key-auth plugin, and OAuth plugin can be configured globally (applying to all routes unless explicitly exempted) or per-route. A global JWT plugin enforces JWT authentication on every route; individual routes can be exempted using Kong's plugin configuration model. This default-protected model is the inverse of AWS API Gateway's default-open model — a new route is protected unless explicitly exempted, rather than unprotected unless explicitly configured. The structural security implication is significant: in Kong, a developer who adds a new route without thinking about auth gets a protected route by default (the request is rejected until authentication is provided). In AWS API Gateway, a developer who adds a new route without thinking about auth gets an open route by default (the request succeeds without any credential).

The operational cost of Kong is the self-hosting burden. Kong's data plane (the gateway instances that process API traffic) must be run, scaled, and maintained by the team. In Kubernetes environments, the Kong Ingress Controller manages Kong configuration declaratively using Kubernetes Custom Resources (KongPlugin, KongConsumer, KongIngress — or the newer Gateway API resources), which integrates gateway configuration into the same IaC model as the rest of the Kubernetes cluster. Outside Kubernetes, Kong's declarative configuration tool (decK) manages gateway configuration as YAML files synced to the Kong admin API, which fits a GitOps workflow but requires operational familiarity with decK and the Kong admin API. The build-vs-buy decision for Kong versus AWS API Gateway is primarily the question of whether the flexibility of Kong's plugin model and its default-protected auth model justify the operational cost of self-hosting a Kong cluster. For teams already operating Kubernetes infrastructure, the incremental operational cost of the Kong Ingress Controller is small relative to the gateway capability it provides.

nginx / nginx-plus

nginx is the foundational HTTP server and reverse proxy that underlies many higher-level gateway products, including Kong. Used directly as an API gateway, nginx provides the limit_req_zone and limit_req directives for rate limiting using a token bucket algorithm. The rate limit state lives in shared memory on a single nginx instance — nginx does not natively share rate limit state across multiple instances. Multi-instance nginx clusters each enforce their own rate limits independently, which means the effective cluster rate limit for N instances is N × the per-instance limit unless external state sharing is implemented via a Lua module and Redis.

nginx does not natively validate JWTs. The commercial nginx-plus product includes the ngx_http_auth_jwt_module for JWT validation at the proxy level. Open-source nginx requires either the OpenResty ecosystem (which embeds LuaJIT into nginx and provides the lua-resty-jwt library) or delegating auth validation to an auth service called via auth_request. Both approaches require engineering investment beyond a standard nginx configuration. For teams that need both JWT validation and distributed rate limiting in an nginx-based gateway, the dependency chain (nginx + Lua + LuaJIT for JWT, nginx + Lua + Redis for distributed rate limiting) approaches the complexity of self-hosting Kong without the plugin management model that makes Kong's Lua plugins composable and auditable.

nginx is the appropriate gateway choice when the team needs extreme throughput at minimal latency overhead, is operating simple routing rules (path prefix → backend service), does not need JWT validation at the gateway layer (validating tokens in each backend service), and has in-house nginx expertise. nginx processes millions of requests per second on commodity hardware at single-digit milliseconds of overhead, which makes it the right choice for high-throughput proxying scenarios where the gateway itself must never be the bottleneck. For API gateway use cases that require per-user rate limiting, JWT validation, plugin extensibility, or audit logs of authentication decisions, nginx's configuration model requires substantial custom engineering to reach feature parity with Kong or managed gateway products.

The AI chat sessions that produced undocumented decisions

API gateway decisions are made across a cluster of sessions that feel like configuration work rather than architecture work. The initial setup session selects the vendor and gets the first endpoint running. Subsequent sessions add features — authentication, rate limiting, public endpoints, admin endpoints — each solving an immediate problem without revisiting the structural decisions established in earlier sessions. The decisions accumulate silently across sessions, each individually reasonable, until an incident makes the structural gaps visible.

The initial platform selection session — "what API gateway should I use for a Node.js backend on AWS?" — produces the vendor choice without any cost modeling at scale. In 2019, AWS API Gateway REST API was the only AWS option. A tutorial-following session in 2019 would have produced REST API. In 2021, the same question might have produced HTTP API from a more current tutorial. In 2024, the same question might have produced a hybrid recommendation comparing REST API and HTTP API. The session from the year the product was started determines the gateway choice, and teams rarely revisit it because the gateway is working, the cost is not yet alarming, and the developer who set it up is the only person who knows what was decided and why. See the structural pattern in decisions never written down — the session closes when the first endpoint works, and the constraints of the solution are never externalized.

The authentication configuration session — "how do I add JWT authentication to API Gateway?" — correctly configures a JWT authorizer on the existing routes. The ChatGPT response explains how to create a Cognito user pool, how to create an authorizer resource, and how to attach it to a route. The session ends when authentication is working on the existing routes. The session does not cover what happens to future routes added by developers who copy Terraform resource blocks. It does not cover the default state of new routes. It does not cover the policy that ensures every non-public route has an authorizer. Writing down "the authorizer configuration for existing routes" is not the same as writing down "the policy for all future routes" — the first is a configuration state, the second is an architecture decision with enforcement. The configuration state is derivable from Terraform. The policy is not derivable from anything unless it was written down.

The rate limiting configuration session — "how do I implement rate limiting for my API?" — configures usage plan rate limits on API keys for authenticated endpoints. The ChatGPT response correctly explains that usage plans let you set requests-per-second limits per API key, and that Terraform can manage usage plans and API keys declaratively. The session does not cover public endpoints that don't use API keys. The session does not cover what happens when a public endpoint consumes Lambda concurrency that authenticated endpoints depend on. The session solves the authenticated-traffic rate limiting problem and closes. The unauthenticated-traffic rate limiting problem is not in scope because nobody has experienced it yet. The failure mode — flooded public endpoint starves authenticated traffic via shared Lambda concurrency pool — requires the intersection of a scraper hitting a public endpoint and shared concurrency, which has not happened yet at the time of the rate limiting session. The rate limiting decision record for the API covers the strategy; the gateway ADR must cover which endpoints that strategy actually protects.

The cost optimization session — "our AWS API Gateway costs are higher than expected, how do we reduce them?" — produces the discovery that HTTP API is 70% cheaper than REST API. The session surfaces the price differential, the feature comparison, and the migration steps. What it does not produce is a record of why REST API was chosen originally (based on a 2019 tutorial before HTTP API existed), which REST API features are currently in use (mapping templates, usage plans, Lambda authorizers), and which of those features are required versus inherited-by-default. Without that audit, the migration analysis is "HTTP API is cheaper" — which is true — without the nuance of "we use usage plans for per-customer rate limiting, which HTTP API doesn't support natively, so migration to HTTP API requires adopting WAF for rate limiting and the $5/month WAF baseline plus $0.60/million WAF evaluation cost partially offsets the per-request savings at our current volume." The WhyChose extractor run on the API Gateway cost analysis sessions and the initial setup session together surface the original selection context — the 2019 tutorial, the feature assumptions at the time, the cost expectations — that gives the migration decision the information it needs to be made correctly rather than reactively.

The incident response session — "our Lambda functions are being throttled and I don't know why, paid API users are getting 429s" — produces the diagnosis (public endpoint consuming all Lambda concurrency) and the patch (WAF rate limiting on the public endpoint). The session does not produce documentation of the Lambda concurrency architecture — which functions have reserved concurrency, which share the account-level concurrency pool, what the expected failure mode is when the public endpoint is flooded, and what the remediation procedure is for the on-call engineer who encounters this at 2am on a Saturday. Those artifacts require a gateway ADR that was never written. The postmortem ticket captures the incident; the ADR would capture the structural design that the next engineer needs to understand before adding another Lambda function to the account. The infrastructure-as-code strategy for Lambda concurrency configuration is the parallel document — the IaC ADR covers how concurrency is managed in Terraform, and the gateway ADR covers what the concurrency boundary policy is for the gateway's Lambda functions. Neither exists without being deliberately written.

What to actually document in the API gateway ADR

An API gateway ADR that prevents authentication bypass incidents and rate limiting gaps does not document how the gateway was configured — the Terraform state captures that. It documents why this gateway was chosen, what structural constraints that choice imposes on all future route additions and Lambda function deployments, and what decisions were made during configuration that a future engineer cannot infer from the infrastructure code alone.

The authentication policy is the most important section. Document the default authentication state of a new route ("AWS API Gateway HTTP API defaults to open — authorization_type must be explicitly set to JWT on every route that should be protected; the default is no auth"), the policy for new routes ("all routes whose path matches /user/*, /admin/*, or any route that performs a write operation must have authorization_type = JWT with authorizer ARN [arn:aws:apigateway:...]"), the enforcement mechanism ("Terraform check in CI/CD rejects plans where new routes in protected path patterns have authorization_type = NONE; exceptions require a comment explaining why the route is intentionally public"), and the bypass surface ("Lambda functions [list] are invokable directly from [internal service list] via Lambda ARN invocation, bypassing the gateway authorizer; those Lambda functions implement their own token validation via [library/approach]"). The policy is not derivable from the Terraform state. The state shows what is configured; the ADR shows what should be configured and why. See the ADR format guidance for the Consequences section where the negative consequence of AWS API Gateway's default-open model should be made explicit: "each new route must be explicitly protected; a new developer who copies an unprotected route resource creates an unprotected route."

The rate limiting architecture for unauthenticated endpoints must be documented separately from the authenticated-endpoint rate limiting. Document which endpoints are in the usage plan / API key rate limiting scope ("all routes under /api/v1/* that require auth"), which are outside that scope ("public routes: /catalog, /pricing, /status"), and what rate limiting protection applies to out-of-scope endpoints ("WAF rule: IP-based rate limiting at 10,000 requests per 5-minute window on all public routes; WAF web ACL ARN: [arn:...]"). Document the Lambda concurrency boundary: "Public endpoint Lambda functions (catalog-handler, pricing-handler) have reserved concurrency of 200 each. Authenticated API Lambda functions (api-handler) have reserved concurrency of 500. The account-level Lambda concurrency limit is 1,000; 300 unreserved executions are available for burst and auxiliary functions. A flooded public endpoint is bounded to 200 concurrent executions and cannot starve authenticated traffic beyond the 500 reserved concurrency limit." Without this document, the on-call engineer encountering throttling on authenticated APIs cannot determine whether the throttling source is within the rate-limited scope or outside it without manually correlating Lambda concurrency metrics with API Gateway metrics at 2am.

The cost model at current and projected scale prevents the REST API versus HTTP API discovery-during-optimization pattern. Document the current per-request rate ("REST API, $3.50 per million requests"), the current monthly volume ("approximately 45 million requests per month"), the current monthly gateway cost ("approximately $157/month"), the HTTP API cost at the same volume ("approximately $45/month — 71% lower"), and the REST API features currently in use that prevent migration ("usage plans with API keys are used for per-customer rate limiting; 23 customers have distinct API keys with individual rate limit tiers"). If the usage plan dependency did not exist, the migration to HTTP API would reduce the gateway cost by $112/month at current volume, scaling proportionally with traffic growth. The decision to remain on REST API is a deliberate choice to keep per-key rate limiting at the cost of the price differential, not an accidental default. Documenting it makes the tradeoff visible to the engineer who encounters the $47,000 monthly invoice at 100× current traffic and wonders why HTTP API was never evaluated. Consult the guidance on documenting architecture decisions for the cost-tradeoff framing in the Consequences section.

The infrastructure-as-code source of truth for gateway configuration must be documented explicitly. "API Gateway route configuration is managed in Terraform in the infra/api-gateway/ module. Route authorization types, authorizer ARN references, and integration Lambda ARNs are all in Terraform state. Direct console edits to route configuration are not permitted — the next terraform apply will overwrite console changes. WAF rules are in infra/waf/. Lambda concurrency configuration is in each Lambda function's Terraform resource in infra/lambdas/. The source of truth for each configuration domain is the Terraform module, not the AWS console." Without this documentation, a developer who edits a route authorizer in the AWS console during an incident creates drift between the console state and the Terraform state. The next infrastructure deployment silently reverts the console change, reintroducing whatever the developer was fixing. The infrastructure-as-code strategy decision record covers the IaC tool choice and the source-of-truth policy broadly; the gateway ADR specifies it for the gateway configuration specifically.

The payload size limits and streaming policy matter for teams that handle file uploads or large API responses. "Maximum request payload: 10MB (HTTP API limit). File uploads must use the pre-signed S3 URL pattern: the client requests a pre-signed URL from POST /uploads/request, uploads directly to S3 via the pre-signed URL, then notifies the API via POST /uploads/complete with the S3 key. Large response payloads (analytics exports, bulk data downloads) use the same pattern in reverse: the export is written to S3, and the API returns a pre-signed download URL. The gateway payload limit is not a factor for standard CRUD endpoints; it is a factor for any endpoint that attempts to return more than 10MB in a single synchronous response." The payload limit is in the AWS documentation, but the decision to use pre-signed S3 URLs as the workaround — rather than streaming, chunked transfer, or pagination — is an architecture decision that should be in the ADR, not discovered when an engineer tries to return a 12MB analytics export through the API endpoint and receives a 413 error in production.

The ADR template for API gateway selection

The template below follows the Nygard format extended with gateway-specific sections. It captures the sections whose absence produced the incidents described above. Adapt field values to the selected gateway product.

# ADR-NNN: API gateway selection

## Status
Accepted / Proposed / Superseded by ADR-NNN

## Context
[What API traffic is the gateway serving? Internal microservices, public
consumer API, B2B partner API, webhooks? What are the authentication
requirements? Per-customer rate limiting? Public unauthenticated endpoints?
What is the current and projected traffic volume? What is the team's
operational capacity for self-hosted infrastructure?]

## Decision
We will use [AWS API Gateway REST API / HTTP API / Kong / nginx /
other] for [scope: all external API traffic / internal only / etc].

## Authentication policy
Default route state: [open unless explicitly protected / protected
  unless explicitly exempted — document which]
Policy for protected routes: [all routes matching pattern X must have
  authorization_type = JWT / Lambda authorizer ARN Y / IAM auth]
Enforcement mechanism: [Terraform policy check / Sentinel rule /
  code review checklist — document what enforces the policy]
JWT claim propagation: [which claims are forwarded to backend services
  and how — header name, format, signed vs unsigned]
Bypass surface: [which Lambda functions / services are invokable
  directly without passing through the gateway; what auth they
  implement independently of gateway-level auth]

## Rate limiting architecture
Authenticated endpoint rate limiting:
  Key: [API key / JWT sub claim / custom header]
  Mechanism: [usage plan / WAF / Kong rate-limiting plugin / limit_req]
  Scope: [which routes / patterns are covered]
  Limit values: [requests per second / per minute / per day, per key]
Unauthenticated endpoint rate limiting:
  Scope: [which routes / patterns are public / unauthenticated]
  Mechanism: [WAF IP-based / Kong global plugin / none]
  Limit values: [requests per window, window size]
  First-window gap: [document if WAF 5-minute window creates a gap]
Lambda concurrency boundaries (if Lambda backend):
  Public endpoint Lambda(s): reserved_concurrent_executions = [N]
  Authenticated endpoint Lambda(s): reserved_concurrent_executions = [M]
  Account-level concurrency limit: [total] — remaining unreserved: [R]
  Failure mode if public endpoint is flooded: [bounded to N, does not
    starve authenticated traffic because of reserved concurrency]

## Cost model
Gateway product: [REST API / HTTP API / Kong self-hosted / other]
Per-request pricing: [$X per million at current tier]
WAF add-on: [$5/month ACL baseline + $0.60/million evaluated — yes/no]
Current monthly volume: [N] million requests
Current monthly gateway cost: $[amount]
If alternative exists (e.g., HTTP API for REST API users):
  Alternative per-request rate: $[Y per million]
  Features preventing migration: [usage plans / mapping templates /
    Lambda authorizer logic / etc — document which are required]
  Annual cost differential: $[Z/year if migrated]
  Migration decision: [migrate by [date] / stay on [product] because
    [feature dependency] — explicit decision, not default]
Projected cost at 3-year traffic estimate:
  Estimated volume: [N] million requests/month
  Estimated cost: $[amount/month] at current gateway pricing

## Payload limits and streaming policy
Maximum request payload: [size limit for chosen product]
Maximum response payload: [size limit]
Large payload workaround: [pre-signed S3 URL / chunked transfer /
  pagination — document the pattern and which endpoints use it]

## Infrastructure-as-code source of truth
Gateway route config source of truth: [Terraform module path / CDK
  stack / CloudFormation template — no console edits permitted]
WAF rule source of truth: [IaC path]
Lambda concurrency source of truth: [IaC path per Lambda resource]
Console edit policy: [not permitted — terraform apply reverts /
  permitted for emergency and PRed within 24h — document which]

## Consequences
Positive: [capabilities this gateway provides for the team's use cases]
Negative: [auth default-open risk for REST/HTTP API requiring explicit
  enforcement; unauthenticated traffic outside rate limiting scope;
  WAF first-window gap; REST API cost premium over HTTP API; self-hosted
  operational burden for Kong/nginx; per-instance rate limiting for
  nginx without Redis]
Risks: [document what to monitor as traffic grows — cost tiers, Lambda
  concurrency ceilings, WAF rule evaluation cost at high volume]

The sections that teams consistently skip are the authentication policy with its enforcement mechanism (the authorizer is configured; the policy for future routes is not), the rate limiting architecture for unauthenticated endpoints (the authenticated endpoints are rate-limited; the public endpoints are not), and the cost model comparison that documents whether the team is on REST API or HTTP API deliberately or by default. Those three sections are the ones whose absence produces the authentication bypass incident, the Lambda concurrency exhaustion incident, and the cost surprise at 10× traffic. Write them before the third developer adds their first route, not after the security disclosure makes the default-open behavior visible.

API gateway decisions share the same structural characteristic as the other infrastructure platform decisions in this series: the initial choice is made quickly, at low traffic, when the structural consequences are invisible. The API versioning strategy decision record and the gateway ADR reference each other — the routing model the gateway provides (path-based versioning, header-based versioning, subdomain routing) constrains the versioning strategy options. Writing the gateway ADR forces the explicit statement of what routing model the gateway enforces, which in turn forces the explicit statement of how versioning will work as the API evolves. The two decisions are made in separate sessions and typically documented in separate ADRs, but they are not independent. The gateway ADR is where their relationship is stated.

Frequently asked questions

What is the difference between AWS API Gateway REST API and HTTP API?

AWS API Gateway REST API (v1) launched in 2015 and charges $3.50 per million requests. It supports usage plans and API keys for per-customer rate limiting, Lambda authorizers for custom auth logic, mapping templates for request and response transformation, and the full AWS service integration model. AWS API Gateway HTTP API (v2) launched in 2020 and charges $1.00 per million requests — approximately 70% cheaper. HTTP API includes a native JWT authorizer (no Lambda function needed for standard JWT validation), lower invocation latency (2–3ms overhead vs 10ms for REST API), and a simpler integration model. HTTP API does not support usage plans, API keys, or per-route throttle overrides — rate limiting requires AWS WAF. Teams that adopted API Gateway before 2020 typically continue using REST API without evaluating whether HTTP API covers their requirements. At 100 million monthly requests, the cost differential is $250/month; at 1 billion monthly requests, $2,500/month. The migration from REST API to HTTP API is appropriate when usage plans are not required for per-customer rate limiting, mapping templates are not used, and Lambda authorizer logic is standard JWT validation that the native JWT authorizer can replace. The gateway ADR should document which REST API features are actively used, the annual cost differential, and whether migration is planned, deferred, or not applicable due to feature dependencies.

How do I rate limit unauthenticated API endpoints at AWS API Gateway?

AWS API Gateway usage plans and API key rate limiting apply only to requests that carry an API key credential. Unauthenticated public endpoints — product catalog, public pricing tiers, status pages — receive no protection from usage plan rate limits regardless of request volume. There are three approaches for rate limiting unauthenticated endpoints. AWS WAF attached to API Gateway applies IP-based rate limiting rules with a 5-minute evaluation window — the request count must exceed the threshold within the 5-minute window before the rule blocks requests, which means the first requests in each new window always pass through before the counter engages. WAF costs $5/month per web ACL plus $0.60 per million requests evaluated. The second approach is Lambda reserved concurrency: assigning reserved concurrency to public endpoint Lambda functions (preventing them from consuming more than their reserved allocation) isolates public traffic from authenticated traffic concurrency. A flooded public endpoint is bounded to its reserved concurrency and cannot starve authenticated endpoint Lambda functions that have their own reserved allocation. The third approach is placing a CloudFront distribution in front of the public endpoints and using CloudFront Functions or Lambda@Edge for rate limiting with shared state in CloudFront's tier. The appropriate choice depends on whether WAF's 5-minute window gap is acceptable, whether the team already operates CloudFront, and whether the priority is limiting requests to the Lambda function or just limiting requests to the API Gateway endpoint. Reserved concurrency is the minimum viable isolation measure and should be applied regardless of which rate limiting approach is chosen.

When should a team use Kong Gateway instead of AWS API Gateway?

Kong Gateway is appropriate when the team requires distributed rate limiting enforced consistently across multiple gateway instances (Kong's rate-limiting plugin uses Redis to share counter state, so the limit is enforced at the cluster level rather than per-instance), when the default-protected authentication model is preferable to per-route explicit protection (Kong's global JWT plugin protects all routes unless explicitly exempted, the inverse of AWS API Gateway's default-open behavior), when the backend services are not on AWS or span multiple clouds, or when the team needs fine-grained per-consumer and per-route rate limits configurable independently. Kong is self-hosted — the team maintains the Kong cluster, manages plugin upgrades, and operates the Redis dependency for distributed rate limiting. In Kubernetes environments, the Kong Ingress Controller manages this operationally with declarative configuration via Kubernetes Custom Resources. For teams already operating Kubernetes, the incremental operational cost of Kong Ingress Controller is modest relative to its capabilities. Kong is not appropriate when the team wants zero gateway operational overhead (AWS API Gateway is fully managed and requires no cluster maintenance), when the backend is entirely Lambda-based and integration simplicity is the priority, or when the team has no existing Redis infrastructure. The build-vs-buy choice between Kong and AWS API Gateway is primarily whether gateway configuration flexibility and default-protected auth justify the operational self-hosting cost.

What should an API gateway ADR document that teams typically skip?

Teams typically document the gateway product name and the first working route configuration. The ADR sections that prevent security incidents and cost surprises are: (1) the authentication policy — not just the authorizer on existing routes, but the default state of a new route, the rule for which routes must be protected, and how that rule is enforced so a new developer cannot accidentally deploy an unprotected admin route; (2) the rate limiting architecture for unauthenticated endpoints — authenticated endpoints covered by usage plans or API keys have known rate limiting behavior, but public endpoints with no auth token are outside that model and require separate treatment (WAF, reserved Lambda concurrency, or a caching layer that reduces origin Lambda pressure); (3) the bypass surface — which Lambda functions are invokable directly from internal services without passing through the gateway, and what auth those functions implement independently of gateway-level auth; (4) the cost model comparison — for AWS API Gateway, the REST API versus HTTP API cost differential is 70% for equivalent traffic, and teams that chose REST API before HTTP API was available should document whether migration is planned or blocked by specific feature dependencies; (5) the Lambda concurrency boundaries — whether authenticated and unauthenticated traffic share the same concurrency pool, what reserved concurrency is configured on which functions, and what the degradation behavior is when a public endpoint is flooded. These sections are not derivable from Terraform state — the state shows the current configuration, but not the policy for future configuration or the structural constraints that the current configuration imposes on future decisions.