SSE Delivery Analysis — Origin and Akamai CDN

The SSE Renaissance

Server-Sent Events — a W3C spec from 2006 — has become the protocol powering every major LLM streaming interface.

OpenAI

ChatGPT

Anthropic

Claude

Google

Gemini

xAI

Grok

text/event-stream

💻

Enterprise Apps

📱

Chat Interfaces

⚙

API Integrations

📊

Real-time Dashboards

Provider	Protocol	Use Case
OpenAI (ChatGPT)	`text/event-stream`	Token streaming for completions
Anthropic (Claude)	`text/event-stream`	Streaming message responses
Google (Gemini)	`text/event-stream`	Real-time model output
xAI (Grok)	`text/event-stream`	Streaming chat completions
Cohere	`text/event-stream`	Streaming chat + embeddings

$8.8B → $71.1B Enterprise LLM market projected growth by 2034 — 78% of organizations adopting AI. SSE is the delivery protocol.

The Problem

Standard web delivery assumptions break down with Server-Sent Events. Most organizations deploying LLM applications inherit SSE infrastructure requirements they don't understand.

⏸

CDNs buffer responses by default

Designed for complete HTTP responses, CDNs accumulate data before forwarding — breaking the real-time nature of SSE token streaming.

🛡

WAFs can't inspect streaming responses

Traditional request/response inspection models fail with long-lived, chunked streams that deliver data continuously.

⚠

Rate limiters conflate connections with events

A single SSE connection carrying 10,000 events per hour appears as one HTTP request to traditional rate limiting.

🤖

Bot detectors flag legitimate API clients

Programmatic SSE consumers lack browser telemetry — no cookies, no JavaScript execution, no mouse movements.

⏱

Idle timeouts kill long-lived streams

CDN defaults (Akamai: 120-500s, Cloudflare: 100s) silently terminate SSE connections that appear idle between events.

Zero origin offload. Unlike traditional HTTP where CDN caching can absorb 80-99% of traffic, every SSE stream requires a persistent origin connection. The CDN provides security, geographic optimization, and TLS offload — but the origin must be sized for full connection load.

CDN Delivery for Server-Sent Events

Akamai delivery products (Ion, DSA) support SSE out of the box — with a few critical metadata changes. But SSE behaves fundamentally differently than request/response HTTP, and the CDN’s value proposition shifts accordingly.

How SSE Differs from Standard HTTP

A normal HTTP transaction is short-lived: request in, response out, connection recycled. CDNs are built around this model — cache the response, serve it to the next client, offload the origin.

SSE flips this. The connection stays open for minutes or hours. The response is never “complete” — the server pushes events indefinitely. There is no cacheable object. Every client requires a dedicated, persistent connection all the way back to origin. The CDN can’t coalesce, cache, or collapse these connections.

What the CDN Still Provides

The CDN can’t cache SSE, but it still delivers real value: TLS termination at the nearest edge PoP (lower handshake latency), DDoS absorption before traffic reaches origin, bot management and WAF inspection on the request path, SureRoute for optimized mid-mile transport, and DataStream observability without origin instrumentation.

The tradeoff is configuration complexity. By default, Akamai buffers responses before forwarding — designed for efficiency with cacheable objects, but fatal for real-time SSE delivery. Disabling this requires advanced metadata (not in Property Manager UI) via a Professional Services engagement.

Test Architecture

Two delivery paths measured with identical SSE payloads. Each event carries a server-side high-resolution timestamp for precise latency measurement.

Origin

Test Client

→

HAProxy

→

SSE Server Pod

Akamai

Test Client

→

Akamai CDN

→

HAProxy

→

SSE Server Pod

Linode LKE (us-ord) 1 Server Pod OTel Collector → Tempo Prometheus → Grafana

Live Path Comparison

Simultaneous SSE streams on both paths. Same event IDs, same client clock — which path delivers each event first?

Event Interval

100ms

Payload Size

256B

Duration

30s

Origin (origin-sse.connected-cloud.io)

StatusIdle

TTFB—

Events0

Avg Latency—

Jitter—

P99—

Akamai (sse.connected-cloud.io)

StatusIdle

TTFB—

Events0

Avg Latency—

Jitter—

P99—

Origin Wins0

Akamai Wins0

Ties0

Mean Delta—

Median Delta—

Event Delivery Delta (Origin vs Akamai)

ID	Origin	Akamai	Delta	Winner
Start a comparison to see matched event pairs...

Live Connection Traces

Connection phase breakdown for both paths via Resource Timing API. Run a comparison to populate.

Origin Path

Waiting for test...

Akamai Path

Waiting for test...

Benefits & Drawbacks

CDN delivery for SSE provides real security and connectivity value, but the traditional caching model doesn't apply. Here's what you gain and what you give up.

Benefits

Capability	Detail
TLS Termination at Edge	Clients negotiate TLS with nearest edge PoP — lower handshake latency, reduced origin TLS load
DDoS Absorption	Edge absorbs volumetric and connection-flood attacks before they reach origin
Bot Management	Behavioral detection and challenge mechanisms filter malicious clients at edge
WAF (Request Inspection)	Request headers, query params, and POST bodies inspected at edge before forwarding
Observability (DataStream)	Per-connection logging, timing data, and real-time telemetry without instrumenting origin
Global DNS & Routing	Proprietary internet mapping routes clients to optimal edge PoP — lower RTT for geographically distributed users

Drawbacks

Limitation	Detail
Zero Connection Offload	Every client SSE connection requires a corresponding origin connection — 0% cache hit rate
Buffering by Default	Without advanced metadata configuration, edge buffers responses — breaking real-time delivery
1:1 Origin Connections	No connection coalescing — origin must be sized for full concurrent client count
WAF Can't Inspect Streams	Response body inspection fails on chunked, long-lived streams — must exempt SSE paths
Advanced Config Required	Critical settings (`buffer-response-v2`, `chunk-to-client`) require PS engagement — not in Property Manager UI

Akamai CDN Configuration Guide

Critical configuration required to deliver SSE through Akamai without buffering. Advanced metadata requires Professional Services engagement.

Property Manager Rule Tree

Default Rule ├── Origin: origin-sse.connected-cloud.io:443 (TLS) ├── CP Code: sse-cdn ├── Allow Transfer Encoding: enabled │ └── Match: /events path ├── Caching: NO_STORE ├── Downstream Cache: BUST ├── Allow Transfer Encoding: enabled └── Advanced Metadata: ├── chunk-to-client: on ├── buffer-response-v2: off ├── forward-chunk-boundary-alignment: on ├── lma.origin-edge: off └── lma.edge-browser: off

Required Behaviors

Behavior	Setting	Purpose	UI Available?
Caching	`NO_STORE`	Prevent edge caching of SSE	Yes
Downstream Cache	`BUST`	Prevent client/intermediate caching	Yes
Allow Transfer Encoding	`enabled`	Enable chunked transfer	Yes
Response Buffer	`off`	Disable edge response buffering	No (metadata)
Origin Timeout	`600s`	Allow long-lived SSE connections	Product-dependent

Advanced Metadata (Critical for SSE)

These behaviors are not available in Property Manager UI and require Akamai PS or TAM engagement. Without them, the edge buffers SSE responses, breaking real-time delivery.

Metadata Tag	Value	Effect
`http.buffer-response-v2`	`off`	Disables edge response buffering — events forwarded immediately
`http.chunk-to-client`	`on`	Enables chunked encoding to the client
`http.forward-chunk-boundary-alignment`	`on`	Preserves chunk boundaries from origin
`lma.origin-edge`	`off`	Disables last-mile acceleration (origin to edge)
`lma.edge-browser`	`off`	Disables last-mile acceleration (edge to browser)

WAF / App & API Protector Considerations

Concern	Impact on SSE	Recommendation
Response body inspection	Cannot buffer streaming response	Exempt `/events` from response inspection
Rate limiting	1 SSE connection = 1 HTTP request	Limit per-connection, not per-request
Bot detection	API clients lack browser signals	Whitelist SSE client User-Agents
Slow POST detection	Long-lived connections appear idle	Increase thresholds for SSE paths

Timeout Management

Layer	Default	SSE Setting	Configuration
Edge → Client	120s	<120s heartbeat	Server sends `: heartbeat` every 15-30s
Edge → Origin	120s	600s	Origin timeout behavior
HAProxy	30s	600s+	`timeout-server`, `timeout-tunnel`
Browser EventSource	auto-reconnect	3000ms	Server sends `retry: 3000`

H2 to Origin: Stream Multiplexing for SSE

Each SSE stream is an unbroken, long-lived socket from the CDN edge to origin. Unlike a page load — where the browser opens a connection, fetches a response, and returns the socket to the pool — an SSE connection is held open for the lifetime of the subscription. That connection is never available for reuse. With HTTP/1.1 to origin, this means every concurrent SSE subscriber consumes a dedicated origin TCP socket.

The Problem: Connection Pinning

HTTP/1.1 allows only one in-flight request per TCP connection. For short-lived request/response traffic, the CDN recycles connections via persistent connection (PCONN) pools — one socket serves many sequential requests. SSE breaks this model. The response never completes, so the connection is pinned for the stream’s entire duration.

At scale this becomes an origin capacity problem: 10,000 concurrent SSE subscribers = 10,000 origin TCP sockets, each consuming kernel file descriptors, memory, and load balancer state. The CDN passes every connection through without reduction.

The Solution: H2 Stream Coalescing

HTTP/2 multiplexes many logical streams onto a single TCP connection. With http2Enabled: true on the origin behavior, the Akamai edge opens one H2 connection per origin node per SureRoute parent server — then maps every concurrent SSE subscriber flowing through that edge server onto shared H2 streams on the same socket.

10,000 concurrent SSE subscribers flowing through 3 edge parent servers = 3 origin TCP connections, not 10,000. Origin connection pressure scales with edge topology, not client count.

H1 to Origin

1:1

One origin TCP connection per SSE subscriber. No multiplexing, no connection sharing.

H2 to Origin (Production)

N:1

Tested: 9 concurrent requests (6 SSE streams) on a single TCP socket. Connection reqs: 1…9.

Caveat: Staging

No mux

Staging network does not multiplex concurrent SSE. Each stream gets its own socket. Always test on production.

H1 to Origin — Connection Pinned

Browser → Akamai CDN → HTTP/1.1 → origin-sse → NGINX → Express
N concurrent SSE clients = N origin TCP sockets (pinned for stream lifetime)

H2 to Origin — Stream Multiplexed

Browser → Akamai CDN → HTTP/2 → h2-origin-sse → NGINX → Express
N concurrent SSE clients = ~G origin TCP sockets (G = SureRoute parent servers)

Configuration & verification details

Enabling H2 to origin: Set http2Enabled: true on the origin behavior in property rules JSON (PAPI field name — not http2: true). Use a separate origin hostname (e.g., h2-origin-sse) to prevent H2 connection pools from interfering with H1 traffic on the same property.

Verification: NGINX access logs with $connection and $connection_requests variables prove multiplexing. Look for the same connection ID across multiple request log lines. In production testing, connection 1493070 served requests 1 through 9 on a single TCP socket.

Client-to-edge H2: Enabled separately via the http2 behavior with empty options: {}. This is H2 between the browser and the Akamai edge — orthogonal to H2 between edge and origin.

Live H2 Protocol Test

Open N concurrent SSE streams via both H1 and H2 origin paths through Akamai simultaneously. Compares per-stream event delivery latency and jitter between the two L7 protocols. On production, the H2 path multiplexes concurrent streams onto shared L4 TCP connections; the H1 path opens 1 connection per stream.

Concurrent Streams

Event Interval

500ms

Duration

15s

Ramp mode (waves of 10, 3s apart)

Browser limit: ~50 concurrent EventSources before client-side JS processing becomes the bottleneck. For higher concurrency, use a server-side test client.

H1 to Origin (default)

StatusIdle

Streams0

Events0

Avg Latency*—

Jitter—

Per-Stream Avg Event Latency

H2 to Origin (/h2/events)

StatusIdle

Streams0

Events0

Avg Latency*—

Jitter—

Per-Stream Avg Event Latency

H1 Avg Latency*—

H2 Avg Latency*—

Matched Events0

H1 Jitter—

H2 Jitter—

Mean Delta—

Event Delivery Delta (H1 vs H2 Origin)

H1 Origin Connections

L4 Sockets—

SSE Streams—

H2 Origin Connections

L4 Sockets—

SSE Streams—

Mux Ratio—

Origin L4 Sockets Over Time

Configure test parameters and click Start Test to compare H1 vs H2 origin connection behavior.

*Latency is measured as inter-event arrival time on the client (performance.now). Subject to client clock drift and browser event-loop scheduling. Use for relative H1/H2 comparison, not absolute measurement.

Distributed SSE: NATS Fan-Out + GTM Performance Routing

CDN-only SSE funnels every client stream through a single origin. That origin must hold open one long-lived HTTP connection per subscriber — connections that can’t be cached, coalesced, or offloaded by the CDN. Distributed delivery solves this by placing stateless NATS leaf nodes at 8 global Linode regions. The origin publishes each event once to NATS core; leaf nodes fan it out locally to their own subscribers. The origin never sees individual client connections.

CDN-Only Path /events

Client 1

Client 2

Client N

Akamai CDN Edge

Origin Ingress NGINX

Express SSE Server

Each client holds a dedicated persistent connection through the CDN to origin — no multiplexing

Distributed Path /distributed/events

Express

NATS Core

us-ord

Edge

Client

us-lax no subscribers

fr-par

Edge

jp-osa

Edge

Client

in-maa no subscribers

+ 3 more no subscribers

Interest-based routing: only leaves with subscribers receive events from core

Measured Benefits

Origin Connection Offload

N → 1

CDN-only: origin holds one persistent HTTP connection per client stream through the CDN. Distributed: origin maintains a single outbound NATS publish connection regardless of subscriber count. Fan-out to leaves and clients is handled entirely by NATS — origin connection pressure is constant, not proportional to audience size.

Jitter Reduction

2–4× lower

Global benchmark (8 regions, 1s interval): distributed path delivers events with 2–4× lower inter-event jitter at international regions. Chennai: 13ms MAD → 3.4ms. Osaka: 10ms → 2.8ms. Paris: 20ms → 8ms. Local leaf nodes provide more consistent delivery cadence.

Resilience

Fault-isolated

A leaf node failure only affects subscribers in that region. GTM liveness checks detect failures within 30s and reroute traffic to the next-nearest healthy leaf. NATS leaf reconnection is automatic. No single point of failure for event delivery.

Zero-Cost Idle Scaling

Interest-based

NATS interest-based routing: core only sends events to leaves with active SSE subscribers. A leaf with zero subscribers consumes zero bandwidth from core. Add regions by adding a Terraform block — idle regions cost compute, not traffic.

Origin Not In-Line

Zero inbound

Client SSE connections terminate entirely within the Akamai–leaf frame. No inbound connection from the CDN or client ever reaches the origin. The origin pushes events outbound to NATS — it accepts nothing inbound for event delivery, eliminating it as an attack surface for subscriber traffic.

In-Network Inspection

Observable

Event streams flowing through NATS are subscribable by any in-network consumer — security inspection, anomaly detection, compliance logging, or real-time filtering. Tap the stream without touching the origin or the client connection. CDN-only SSE is opaque end-to-end; NATS makes it observable.

What Doesn’t Change

Event delivery latency. Events originate in Chicago regardless of path. Both CDN and distributed deliver the same events within ±10ms of each other at most regions. The distributed path adds a NATS hop but eliminates no propagation distance.

Event fidelity. Both paths deliver identical events — same IDs, same payloads, same order. Global benchmark showed 100% event overlap across all 8 regions.

How it works

The origin Express server publishes events to an in-cluster NATS core node. Each leaf VM runs an embedded NATS leaf server connected to core over a single TCP connection (port 30422, NodePort). When a client opens an SSE stream on the leaf, the leaf subscribes to the sse.events subject — NATS interest-based routing then begins forwarding events from core to that leaf. When all clients disconnect, the subscription drops and event traffic to that leaf stops entirely.

Akamai GTM resolves distributed-sse.connected-cloud.io to the nearest healthy leaf based on performance routing. The Akamai CDN property routes /distributed/events to this GTM-resolved origin, applying the same streaming optimizations (chunked transfer, buffer bypass) as the CDN-only path.

Leaf nodes: 8 regions (us-ord, us-lax, us-mia, fr-par, br-gru, jp-osa, in-maa, ap-southeast). Each is a single Go binary (~15MB, embedded NATS leaf + HTTPS SSE adapter) on a g6-dedicated-2 Linode. Stateless, horizontal, identical. Add or remove a region by adding or removing one Terraform block.

Live Distributed vs CDN Comparison

Open concurrent SSE streams via both CDN-only and distributed paths through Akamai simultaneously. Both paths deliver the same events from the same SharedEventBus — the difference is origin routing. CDN path goes to us-ord origin; distributed path goes to the nearest NATS leaf via GTM.

Hint: Steady-state performance looks similar. Increase the stream count and enable the churn rate slider to simulate clients connecting and disconnecting during the test — observe how connection management overhead affects each path differently at scale.

Concurrent Streams

Event Interval

500ms

Duration

15s

Churn Rate

off

CDN-Only (/events)

StatusIdle

Streams0

Events0

Avg Latency*—

Jitter—

Per-Stream Avg Event Latency

Distributed (/distributed/events)

StatusIdle

Streams0

Events0

Avg Latency*—

Jitter—

Per-Stream Avg Event Latency

CDN Avg Latency*—

Distributed Avg Latency*—

Matched Events0

CDN Jitter—

Distributed Jitter—

Mean Delta—

Event Delivery Delta (CDN vs Distributed)

Serving Leaf Node

Region—

X-SSE-Path—

NATS Connected—

Configure test parameters and click Start Test to compare CDN-only vs distributed SSE delivery.

*Latency is measured as inter-event arrival time on the client (performance.now). Subject to client clock drift and browser event-loop scheduling. Use for relative CDN/distributed comparison, not absolute measurement.