What is a Context Window?

A context window is the maximum amount of text that a large language model can process in a single interaction. Measured in tokens (roughly word fragments), it defines the total capacity for both your input (system prompt, conversation history, documents) and the model's output. If your conversation exceeds the context window, the oldest messages get dropped or the request fails.

Context windows have expanded rapidly. GPT-3 had a 4K token window. Modern models offer 128K to 1M+ tokens. But bigger is not always better -- larger context windows are more expensive to use, slower to process, and models can still lose accuracy when relevant information is buried deep in a long context.

For AI agents, context window management is critical. An agent running a complex, multi-step task accumulates tool call results, web page contents, code outputs, and conversation history. Without careful management, the context fills up and the agent loses track of earlier information.

How Context Windows Work

Token counting -- Every word, punctuation mark, and space is broken into tokens (one token is roughly 3/4 of an English word)
Input + output -- The context window covers both the prompt/history you send AND the response the model generates
Sliding window -- When the context is full, older messages are typically dropped to make room for new ones
Attention mechanism -- The model can theoretically attend to any token in the window, but performance degrades for information in the middle of very long contexts

Why Context Windows Matter

Context windows determine what the model can "see" at any given moment. A model with a 128K context window can process roughly 300 pages of text at once -- but it costs more per token and may still miss details buried in the middle. Effective use of context windows involves strategies like summarization (condensing old messages), RAG (retrieving only relevant chunks), and memory systems (storing facts externally).

For cost control, shorter contexts are cheaper. The token limit on an LLM API call directly affects the cost, making efficient context management a financial consideration as well as a technical one.

How KiwiClaw Manages Context

KiwiClaw agents use OpenClaw's built-in context management, which automatically summarizes conversation history when the context approaches its limit. The LLM proxy tracks token usage per tenant and enforces weekly caps. Users can choose between Auto (cost-efficient, 128K context) and MAX (premium, 200K context) models depending on their needs.

Related Terms

Frequently Asked Questions

What is a context window in AI?

A context window is the maximum amount of text a language model can process in a single interaction, measured in tokens. It covers both the input (prompt, history, documents) and the output (model response). Modern models range from 128K to 1M+ tokens.

What happens when the context window is full?

When the context fills up, the oldest messages are typically dropped (sliding window), the request may fail with an error, or the system may summarize older content to make room. AI agents use memory and RAG systems to work around context limits.

How does KiwiClaw handle context limits?

KiwiClaw agents use OpenClaw built-in context management to automatically summarize conversation history as it approaches the limit. The LLM proxy tracks token usage and users can choose models with different context sizes.