SSE Delivery Analysis

Measuring Server-Sent Events delivery across Origin and Akamai CDN

Node.js 22 Express OpenTelemetry Akamai CDN Kubernetes (LKE) HAProxy Ingress Prometheus Grafana + Tempo

The SSE Renaissance

Server-Sent Events — a W3C spec from 2006 — has become the protocol powering every major LLM streaming interface.

OpenAI
ChatGPT
Anthropic
Claude
Google
Gemini
xAI
Grok
text/event-stream
💻
Enterprise Apps
📱
Chat Interfaces
API Integrations
📊
Real-time Dashboards
ProviderProtocolUse Case
OpenAI (ChatGPT)text/event-streamToken streaming for completions
Anthropic (Claude)text/event-streamStreaming message responses
Google (Gemini)text/event-streamReal-time model output
xAI (Grok)text/event-streamStreaming chat completions
Coheretext/event-streamStreaming chat + embeddings
$8.8B → $71.1B Enterprise LLM market projected growth by 2034 — 78% of organizations adopting AI. SSE is the delivery protocol.

The Problem

Standard web delivery assumptions break down with Server-Sent Events. Most organizations deploying LLM applications inherit SSE infrastructure requirements they don't understand.

CDNs buffer responses by default
Designed for complete HTTP responses, CDNs accumulate data before forwarding — breaking the real-time nature of SSE token streaming.
🛡
WAFs can't inspect streaming responses
Traditional request/response inspection models fail with long-lived, chunked streams that deliver data continuously.
Rate limiters conflate connections with events
A single SSE connection carrying 10,000 events per hour appears as one HTTP request to traditional rate limiting.
🤖
Bot detectors flag legitimate API clients
Programmatic SSE consumers lack browser telemetry — no cookies, no JavaScript execution, no mouse movements.
Idle timeouts kill long-lived streams
CDN defaults (Akamai: 120-500s, Cloudflare: 100s) silently terminate SSE connections that appear idle between events.
Zero origin offload. Unlike traditional HTTP where CDN caching can absorb 80-99% of traffic, every SSE stream requires a persistent origin connection. The CDN provides security, geographic optimization, and TLS offload — but the origin must be sized for full connection load.

CDN Delivery for Server-Sent Events

Akamai delivery products (Ion, DSA) support SSE out of the box — with a few critical metadata changes. But SSE behaves fundamentally differently than request/response HTTP, and the CDN’s value proposition shifts accordingly.

How SSE Differs from Standard HTTP

A normal HTTP transaction is short-lived: request in, response out, connection recycled. CDNs are built around this model — cache the response, serve it to the next client, offload the origin.

SSE flips this. The connection stays open for minutes or hours. The response is never “complete” — the server pushes events indefinitely. There is no cacheable object. Every client requires a dedicated, persistent connection all the way back to origin. The CDN can’t coalesce, cache, or collapse these connections.

What the CDN Still Provides

The CDN can’t cache SSE, but it still delivers real value: TLS termination at the nearest edge PoP (lower handshake latency), DDoS absorption before traffic reaches origin, bot management and WAF inspection on the request path, SureRoute for optimized mid-mile transport, and DataStream observability without origin instrumentation.

The tradeoff is configuration complexity. By default, Akamai buffers responses before forwarding — designed for efficiency with cacheable objects, but fatal for real-time SSE delivery. Disabling this requires advanced metadata (not in Property Manager UI) via a Professional Services engagement.

Test Architecture

Two delivery paths measured with identical SSE payloads. Each event carries a server-side high-resolution timestamp for precise latency measurement.

Origin
Test Client
HAProxy
SSE Server Pod
Akamai
Test Client
Akamai CDN
HAProxy
SSE Server Pod
Linode LKE (us-ord) 1 Server Pod OTel Collector → Tempo Prometheus → Grafana

Live Path Comparison

Simultaneous SSE streams on both paths. Same event IDs, same client clock — which path delivers each event first?

100ms
256B
30s

Origin (origin-sse.connected-cloud.io)

StatusIdle
TTFB
Events0
Avg Latency
Jitter
P99

Akamai (sse.connected-cloud.io)

StatusIdle
TTFB
Events0
Avg Latency
Jitter
P99
Origin Wins0
Akamai Wins0
Ties0
Mean Delta
Median Delta
Event Delivery Delta (Origin vs Akamai)
IDOriginAkamaiDeltaWinner
Start a comparison to see matched event pairs...

Live Connection Traces

Connection phase breakdown for both paths via Resource Timing API. Run a comparison to populate.

Origin Path

Waiting for test...

Akamai Path

Waiting for test...

Benefits & Drawbacks

CDN delivery for SSE provides real security and connectivity value, but the traditional caching model doesn't apply. Here's what you gain and what you give up.

Benefits

CapabilityDetail
TLS Termination at EdgeClients negotiate TLS with nearest edge PoP — lower handshake latency, reduced origin TLS load
DDoS AbsorptionEdge absorbs volumetric and connection-flood attacks before they reach origin
Bot ManagementBehavioral detection and challenge mechanisms filter malicious clients at edge
WAF (Request Inspection)Request headers, query params, and POST bodies inspected at edge before forwarding
Observability (DataStream)Per-connection logging, timing data, and real-time telemetry without instrumenting origin
Global DNS & RoutingProprietary internet mapping routes clients to optimal edge PoP — lower RTT for geographically distributed users

Drawbacks

LimitationDetail
Zero Connection OffloadEvery client SSE connection requires a corresponding origin connection — 0% cache hit rate
Buffering by DefaultWithout advanced metadata configuration, edge buffers responses — breaking real-time delivery
1:1 Origin ConnectionsNo connection coalescing — origin must be sized for full concurrent client count
WAF Can't Inspect StreamsResponse body inspection fails on chunked, long-lived streams — must exempt SSE paths
Advanced Config RequiredCritical settings (buffer-response-v2, chunk-to-client) require PS engagement — not in Property Manager UI

Akamai CDN Configuration Guide

Critical configuration required to deliver SSE through Akamai without buffering. Advanced metadata requires Professional Services engagement.

Property Manager Rule Tree
Default Rule ├── Origin: origin-sse.connected-cloud.io:443 (TLS) ├── CP Code: sse-cdn ├── Allow Transfer Encoding: enabled │ └── Match: /events path ├── Caching: NO_STORE ├── Downstream Cache: BUST ├── Allow Transfer Encoding: enabled └── Advanced Metadata: ├── chunk-to-client: on ├── buffer-response-v2: off ├── forward-chunk-boundary-alignment: on ├── lma.origin-edge: off └── lma.edge-browser: off
Required Behaviors
BehaviorSettingPurposeUI Available?
CachingNO_STOREPrevent edge caching of SSEYes
Downstream CacheBUSTPrevent client/intermediate cachingYes
Allow Transfer EncodingenabledEnable chunked transferYes
Response BufferoffDisable edge response bufferingNo (metadata)
Origin Timeout600sAllow long-lived SSE connectionsProduct-dependent
Advanced Metadata (Critical for SSE)

These behaviors are not available in Property Manager UI and require Akamai PS or TAM engagement. Without them, the edge buffers SSE responses, breaking real-time delivery.

Metadata TagValueEffect
http.buffer-response-v2offDisables edge response buffering — events forwarded immediately
http.chunk-to-clientonEnables chunked encoding to the client
http.forward-chunk-boundary-alignmentonPreserves chunk boundaries from origin
lma.origin-edgeoffDisables last-mile acceleration (origin to edge)
lma.edge-browseroffDisables last-mile acceleration (edge to browser)
WAF / App & API Protector Considerations
ConcernImpact on SSERecommendation
Response body inspectionCannot buffer streaming responseExempt /events from response inspection
Rate limiting1 SSE connection = 1 HTTP requestLimit per-connection, not per-request
Bot detectionAPI clients lack browser signalsWhitelist SSE client User-Agents
Slow POST detectionLong-lived connections appear idleIncrease thresholds for SSE paths
Timeout Management
LayerDefaultSSE SettingConfiguration
Edge → Client120s<120s heartbeatServer sends : heartbeat every 15-30s
Edge → Origin120s600sOrigin timeout behavior
HAProxy30s600s+timeout-server, timeout-tunnel
Browser EventSourceauto-reconnect3000msServer sends retry: 3000

H2 to Origin: Stream Multiplexing for SSE

Each SSE stream is an unbroken, long-lived socket from the CDN edge to origin. Unlike a page load — where the browser opens a connection, fetches a response, and returns the socket to the pool — an SSE connection is held open for the lifetime of the subscription. That connection is never available for reuse. With HTTP/1.1 to origin, this means every concurrent SSE subscriber consumes a dedicated origin TCP socket.

The Problem: Connection Pinning

HTTP/1.1 allows only one in-flight request per TCP connection. For short-lived request/response traffic, the CDN recycles connections via persistent connection (PCONN) pools — one socket serves many sequential requests. SSE breaks this model. The response never completes, so the connection is pinned for the stream’s entire duration.

At scale this becomes an origin capacity problem: 10,000 concurrent SSE subscribers = 10,000 origin TCP sockets, each consuming kernel file descriptors, memory, and load balancer state. The CDN passes every connection through without reduction.

The Solution: H2 Stream Coalescing

HTTP/2 multiplexes many logical streams onto a single TCP connection. With http2Enabled: true on the origin behavior, the Akamai edge opens one H2 connection per origin node per SureRoute parent server — then maps every concurrent SSE subscriber flowing through that edge server onto shared H2 streams on the same socket.

10,000 concurrent SSE subscribers flowing through 3 edge parent servers = 3 origin TCP connections, not 10,000. Origin connection pressure scales with edge topology, not client count.

H1 to Origin
1:1
One origin TCP connection per SSE subscriber. No multiplexing, no connection sharing.
H2 to Origin (Production)
N:1
Tested: 9 concurrent requests (6 SSE streams) on a single TCP socket. Connection reqs: 1…9.
Caveat: Staging
No mux
Staging network does not multiplex concurrent SSE. Each stream gets its own socket. Always test on production.

H1 to Origin — Connection Pinned

Browser → Akamai CDNHTTP/1.1 → origin-sse → NGINX → Express
N concurrent SSE clients = N origin TCP sockets (pinned for stream lifetime)

H2 to Origin — Stream Multiplexed

Browser → Akamai CDNHTTP/2 → h2-origin-sse → NGINX → Express
N concurrent SSE clients = ~G origin TCP sockets (G = SureRoute parent servers)
Configuration & verification details

Enabling H2 to origin: Set http2Enabled: true on the origin behavior in property rules JSON (PAPI field name — not http2: true). Use a separate origin hostname (e.g., h2-origin-sse) to prevent H2 connection pools from interfering with H1 traffic on the same property.

Verification: NGINX access logs with $connection and $connection_requests variables prove multiplexing. Look for the same connection ID across multiple request log lines. In production testing, connection 1493070 served requests 1 through 9 on a single TCP socket.

Client-to-edge H2: Enabled separately via the http2 behavior with empty options: {}. This is H2 between the browser and the Akamai edge — orthogonal to H2 between edge and origin.

Live H2 Protocol Test

Open N concurrent SSE streams via both H1 and H2 origin paths through Akamai simultaneously. Compares per-stream event delivery latency and jitter between the two L7 protocols. On production, the H2 path multiplexes concurrent streams onto shared L4 TCP connections; the H1 path opens 1 connection per stream.

5
500ms
15s

Browser limit: ~50 concurrent EventSources before client-side JS processing becomes the bottleneck. For higher concurrency, use a server-side test client.

H1 to Origin (default)

StatusIdle
Streams0
Events0
Avg Latency*
Jitter
Per-Stream Avg Event Latency

H2 to Origin (/h2/events)

StatusIdle
Streams0
Events0
Avg Latency*
Jitter
Per-Stream Avg Event Latency
H1 Avg Latency*
H2 Avg Latency*
Matched Events0
H1 Jitter
H2 Jitter
Mean Delta
Event Delivery Delta (H1 vs H2 Origin)

H1 Origin Connections

L4 Sockets
SSE Streams

H2 Origin Connections

L4 Sockets
SSE Streams
Mux Ratio

Origin L4 Sockets Over Time

Configure test parameters and click Start Test to compare H1 vs H2 origin connection behavior.

*Latency is measured as inter-event arrival time on the client (performance.now). Subject to client clock drift and browser event-loop scheduling. Use for relative H1/H2 comparison, not absolute measurement.

Distributed SSE: NATS Fan-Out + GTM Performance Routing

CDN-only SSE funnels every client stream through a single origin. That origin must hold open one long-lived HTTP connection per subscriber — connections that can’t be cached, coalesced, or offloaded by the CDN. Distributed delivery solves this by placing stateless NATS leaf nodes at 8 global Linode regions. The origin publishes each event once to NATS core; leaf nodes fan it out locally to their own subscribers. The origin never sees individual client connections.

CDN-Only Path /events

Client 1
Client 2
Client N
Akamai CDN Edge
Origin Ingress NGINX
Express SSE Server
Each client holds a dedicated persistent connection through the CDN to origin — no multiplexing

Distributed Path /distributed/events

Express
NATS Core
us-ord
Edge
Client
us-lax no subscribers
fr-par
Edge
jp-osa
Edge
Client
in-maa no subscribers
+ 3 more no subscribers
Interest-based routing: only leaves with subscribers receive events from core

Measured Benefits

Origin Connection Offload
N → 1
CDN-only: origin holds one persistent HTTP connection per client stream through the CDN. Distributed: origin maintains a single outbound NATS publish connection regardless of subscriber count. Fan-out to leaves and clients is handled entirely by NATS — origin connection pressure is constant, not proportional to audience size.
Jitter Reduction
2–4× lower
Global benchmark (8 regions, 1s interval): distributed path delivers events with 2–4× lower inter-event jitter at international regions. Chennai: 13ms MAD → 3.4ms. Osaka: 10ms → 2.8ms. Paris: 20ms → 8ms. Local leaf nodes provide more consistent delivery cadence.
Resilience
Fault-isolated
A leaf node failure only affects subscribers in that region. GTM liveness checks detect failures within 30s and reroute traffic to the next-nearest healthy leaf. NATS leaf reconnection is automatic. No single point of failure for event delivery.
Zero-Cost Idle Scaling
Interest-based
NATS interest-based routing: core only sends events to leaves with active SSE subscribers. A leaf with zero subscribers consumes zero bandwidth from core. Add regions by adding a Terraform block — idle regions cost compute, not traffic.
Origin Not In-Line
Zero inbound
Client SSE connections terminate entirely within the Akamai–leaf frame. No inbound connection from the CDN or client ever reaches the origin. The origin pushes events outbound to NATS — it accepts nothing inbound for event delivery, eliminating it as an attack surface for subscriber traffic.
In-Network Inspection
Observable
Event streams flowing through NATS are subscribable by any in-network consumer — security inspection, anomaly detection, compliance logging, or real-time filtering. Tap the stream without touching the origin or the client connection. CDN-only SSE is opaque end-to-end; NATS makes it observable.
What Doesn’t Change
Event delivery latency. Events originate in Chicago regardless of path. Both CDN and distributed deliver the same events within ±10ms of each other at most regions. The distributed path adds a NATS hop but eliminates no propagation distance.
Event fidelity. Both paths deliver identical events — same IDs, same payloads, same order. Global benchmark showed 100% event overlap across all 8 regions.
How it works

The origin Express server publishes events to an in-cluster NATS core node. Each leaf VM runs an embedded NATS leaf server connected to core over a single TCP connection (port 30422, NodePort). When a client opens an SSE stream on the leaf, the leaf subscribes to the sse.events subject — NATS interest-based routing then begins forwarding events from core to that leaf. When all clients disconnect, the subscription drops and event traffic to that leaf stops entirely.

Akamai GTM resolves distributed-sse.connected-cloud.io to the nearest healthy leaf based on performance routing. The Akamai CDN property routes /distributed/events to this GTM-resolved origin, applying the same streaming optimizations (chunked transfer, buffer bypass) as the CDN-only path.

Leaf nodes: 8 regions (us-ord, us-lax, us-mia, fr-par, br-gru, jp-osa, in-maa, ap-southeast). Each is a single Go binary (~15MB, embedded NATS leaf + HTTPS SSE adapter) on a g6-dedicated-2 Linode. Stateless, horizontal, identical. Add or remove a region by adding or removing one Terraform block.

Live Distributed vs CDN Comparison

Open concurrent SSE streams via both CDN-only and distributed paths through Akamai simultaneously. Both paths deliver the same events from the same SharedEventBus — the difference is origin routing. CDN path goes to us-ord origin; distributed path goes to the nearest NATS leaf via GTM.

Hint: Steady-state performance looks similar. Increase the stream count and enable the churn rate slider to simulate clients connecting and disconnecting during the test — observe how connection management overhead affects each path differently at scale.

5
500ms
15s
off

CDN-Only (/events)

StatusIdle
Streams0
Events0
Avg Latency*
Jitter
Per-Stream Avg Event Latency

Distributed (/distributed/events)

StatusIdle
Streams0
Events0
Avg Latency*
Jitter
Per-Stream Avg Event Latency
CDN Avg Latency*
Distributed Avg Latency*
Matched Events0
CDN Jitter
Distributed Jitter
Mean Delta
Event Delivery Delta (CDN vs Distributed)

Serving Leaf Node

Region
X-SSE-Path
NATS Connected
Configure test parameters and click Start Test to compare CDN-only vs distributed SSE delivery.

*Latency is measured as inter-event arrival time on the client (performance.now). Subject to client clock drift and browser event-loop scheduling. Use for relative CDN/distributed comparison, not absolute measurement.