What is SSE (Server-Sent Events) Streaming?

SSE (Server-Sent Events) is a web standard that enables a server to push real-time updates to a client over a single, long-lived HTTP connection. In the context of AI agents and LLM APIs, SSE streaming allows language model responses to be delivered token-by-token as they are generated, rather than waiting for the complete response. This creates the familiar "typing" effect seen in ChatGPT and other AI interfaces.

SSE is part of the HTML5 specification and is supported by all modern browsers. Unlike WebSockets, which provide bidirectional communication, SSE is unidirectional -- the server sends events to the client. This simplicity makes it well-suited for streaming LLM responses, where data flows in one direction.

How SSE Works

An SSE connection is established when a client sends a standard HTTP request and the server responds with Content-Type: text/event-stream. Instead of sending a complete response and closing the connection, the server keeps the connection open and sends discrete events as they become available.

Each event follows a simple text format:

data: {"token": "Hello"}

data: {"token": " world"}

data: {"token": "!"}

data: [DONE]

The client processes each event as it arrives, updating the UI incrementally. For LLM responses, this means the user sees text appear word by word instead of waiting seconds for the full response to generate.

SSE vs WebSockets vs Long Polling

SSE -- Unidirectional (server to client), simple, works over standard HTTP, automatic reconnection, good for streaming text. Used by most LLM APIs.
WebSockets -- Bidirectional, more complex, requires protocol upgrade, good for real-time chat and interactive applications. Used by OpenClaw's control UI for agent communication.
Long Polling -- Client repeatedly requests updates, server holds the connection until data is available. Higher overhead, simpler to implement behind restrictive firewalls.

Why SSE Matters for AI Agents

SSE is the standard protocol for LLM API streaming. Anthropic, OpenAI, Moonshot, and other model providers all use SSE to stream completions. For AI agent platforms, handling SSE correctly is critical:

Perceived speed -- Users see the first token in milliseconds rather than waiting seconds for the complete response. This dramatically improves the user experience.
Memory efficiency -- The proxy or middleware never needs to buffer the full response in memory. Each chunk is forwarded as it arrives.
Error handling -- If the model encounters an error mid-stream, the user sees partial output and the error, rather than a blank response after a long wait.
Usage tracking -- Token counts are typically included in the final SSE event, enabling real-time usage monitoring and cap enforcement.

A common mistake in AI agent proxies is buffering the full SSE response before forwarding it to the client. This negates the benefits of streaming and can cause timeout errors for long responses. The correct approach is chunk-by-chunk forwarding -- each SSE event is sent to the client as soon as it arrives from the upstream provider.

How It Relates to KiwiClaw

KiwiClaw's LLM proxy handles SSE streaming for all managed-tier users. When an agent sends a request to the LLM, the proxy opens an SSE connection to the upstream provider (Anthropic or Moonshot), forwards each chunk to the agent as it arrives, and records token usage from the final event. The proxy uses raw Node.js HTTP -- no buffering frameworks -- to ensure true chunk-by-chunk streaming with minimal latency.

BYOK users bypass the proxy entirely, with their agents connecting directly to their LLM provider's SSE endpoints.

Related Terms

Frequently Asked Questions

What is SSE streaming and how does it work?

SSE (Server-Sent Events) is a web standard that enables a server to push real-time updates to a client over a single HTTP connection. For AI agents, SSE streaming delivers language model responses token-by-token as they generate, creating the familiar typing effect instead of waiting for the complete response.

What is the difference between SSE and WebSockets?

SSE is unidirectional (server to client), works over standard HTTP, and includes automatic reconnection. WebSockets are bidirectional, require a protocol upgrade, and are better for interactive applications. Most LLM APIs use SSE for streaming, while OpenClaw uses WebSockets for its control UI.

Why does SSE streaming matter for AI agent performance?

SSE streaming dramatically improves perceived speed because users see the first token in milliseconds rather than waiting seconds for the complete response. It also improves memory efficiency since responses are forwarded chunk-by-chunk without buffering, and enables real-time usage tracking through token counts in the final event.