What is RAG (Retrieval-Augmented Generation)?
Retrieval-Augmented Generation (RAG) is a technique that combines a large language model with a knowledge retrieval system. Before generating a response, the system searches a database of documents to find relevant passages, then includes those passages in the model's context so it can produce accurate, grounded answers based on specific data rather than relying solely on its training knowledge.
RAG addresses one of the biggest limitations of LLMs: they only know what they were trained on, and that training data has a cutoff date. By retrieving current, domain-specific information at query time, RAG enables an AI agent to answer questions about your company's internal docs, recent events, or proprietary data that no public model has seen.
The approach works in two phases. First, a retrieval step searches a vector database of pre-indexed documents to find the most relevant chunks. Second, a generation step feeds those chunks to the LLM along with the user's question, so the model can synthesize an answer grounded in real source material.
How RAG Works
- Document ingestion -- Documents (PDFs, web pages, text files) are split into chunks and converted into numerical vectors (embeddings)
- Indexing -- Embeddings are stored in a vector database for fast similarity search
- Query embedding -- When a user asks a question, the question is also converted into a vector
- Retrieval -- The system finds the most similar document chunks by comparing vectors
- Augmented generation -- Retrieved chunks are included in the LLM's prompt as context, and the model generates a response grounded in that material
Why RAG Matters
RAG dramatically reduces AI hallucinations by grounding responses in actual documents. It also eliminates the need for expensive fine-tuning -- instead of retraining a model on your data, you simply index your documents and retrieve them at query time. This makes RAG faster to set up, cheaper to maintain, and easier to keep current.
For businesses, RAG means your AI agent can answer questions about internal policies, product documentation, customer histories, and other proprietary data without that data ever being included in model training.
How KiwiClaw Uses RAG
KiwiClaw agents support OpenClaw's knowledge base feature. Users upload documents through the dashboard, and KiwiClaw automatically chunks, embeds, and indexes them. When the agent receives a question, it retrieves relevant passages and uses them for grounded responses. This works alongside the agent's other capabilities like tool use and web browsing.
Related Terms
- What is a Vector Database?
- What is AI Hallucination?
- What is a Context Window?
- What is AI Agent Memory?
Frequently Asked Questions
What is Retrieval-Augmented Generation (RAG)?
RAG is a technique that combines a large language model with a knowledge retrieval system. Before generating a response, the system searches a database of documents to find relevant information, then includes that information in the prompt so the LLM can give accurate, grounded answers based on your specific data.
How does RAG reduce AI hallucinations?
RAG reduces hallucinations by grounding the LLM's responses in actual retrieved documents rather than relying solely on the model's training data. When the model has relevant source material to reference, it is far less likely to fabricate information.
Does KiwiClaw support RAG?
Yes. KiwiClaw agents can use OpenClaw's knowledge base feature, which supports uploading documents that are automatically chunked and indexed. When the agent receives a question, it retrieves relevant passages and uses them to generate accurate answers.