Retrieval-Augmented Generation (RAG)¶
RapidAI includes a powerful RAG system that allows you to augment your LLM applications with custom knowledge from documents.
Quick Start¶
from rapidai import App, LLM
from rapidai.rag import RAG
app = App()
llm = LLM("claude-3-haiku-20240307")
rag = RAG()
# Add documents
await rag.add_document("docs/manual.pdf")
# Query with context
answer = await rag.query("How do I install?", llm=llm)
Architecture¶
The RAG system consists of four main components:
- Document Loaders - Load PDF, DOCX, TXT, HTML, Markdown
- Chunkers - Split documents into semantic chunks
- Embeddings - Convert text to vector embeddings
- Vector Database - Store and search embeddings
Configuration¶
Configure RAG via environment variables or YAML:
rag:
embedding:
provider: "sentence-transformers" # or "openai"
model: "all-MiniLM-L6-v2"
chunking:
strategy: "recursive" # or "sentence"
chunk_size: 512
chunk_overlap: 50
vectordb:
backend: "chromadb"
persist_directory: "./chroma_data"
collection_name: "my_docs"
top_k: 5
Or via environment variables:
RAPIDAI_RAG_TOP_K=5
RAPIDAI_EMBEDDING_PROVIDER=openai
RAPIDAI_EMBEDDING_MODEL=text-embedding-3-small
Document Loading¶
Supported Formats¶
- PDF (
.pdf) - Word Documents (
.docx) - Plain Text (
.txt) - Markdown (
.md) - HTML (
.html,.htm)
Loading Documents¶
from rapidai.rag import RAG
rag = RAG()
# Single document
await rag.add_document("path/to/document.pdf")
# Multiple documents
await rag.add_documents([
"doc1.pdf",
"doc2.docx",
"doc3.md"
])
# From Document object
from rapidai.types import Document
doc = Document(
content="Custom content",
metadata={"source": "manual", "version": "1.0"}
)
await rag.add_document(doc)
Embeddings¶
RapidAI supports multiple embedding providers:
Sentence Transformers (Default)¶
from rapidai.rag import Embedding
embedding = Embedding(
provider="sentence-transformers",
model="all-MiniLM-L6-v2"
)
Pros:
- Free and open source
- Runs locally (no API costs)
- Fast for batch processing
Cons:
- Requires local compute
- Lower quality than OpenAI
OpenAI Embeddings¶
embedding = Embedding(
provider="openai",
model="text-embedding-3-small",
api_key="your-api-key" # or OPENAI_API_KEY env var
)
Pros:
- High quality embeddings
- Latest models
Cons:
- API costs
- Requires internet connection
Custom Configuration¶
from rapidai.rag import RAG, Embedding
embedding = Embedding(provider="openai", model="text-embedding-3-large")
rag = RAG(embedding=embedding)
Retrieval¶
Basic Retrieval¶
# Retrieve relevant chunks
result = await rag.retrieve("your query", top_k=5)
print(result.text) # Combined text from chunks
print(result.sources) # List of DocumentChunk objects
With Metadata Filtering¶
# Only search in specific document types
result = await rag.retrieve(
"query",
top_k=5,
filter_metadata={"type": "pdf"}
)
Full RAG Query¶
from rapidai import LLM
llm = LLM("claude-3-haiku-20240307")
# Retrieve + Generate in one call
answer = await rag.query(
"What is the installation process?",
llm=llm,
top_k=5
)
RAG Decorator¶
The @rag() decorator automatically injects retrieval context into your routes:
from rapidai import App, LLM
from rapidai.rag import rag
app = App()
llm = LLM("claude-3-haiku-20240307")
@app.route("/ask", methods=["POST"])
@rag(sources=["docs/manual.pdf"], top_k=5)
async def ask(query: str, rag_context):
"""rag_context is automatically injected."""
prompt = f"Context: {rag_context.text}\n\nQuestion: {query}"
answer = await llm.complete(prompt)
return {
"answer": answer,
"sources": [s.metadata for s in rag_context.sources]
}
Chunking Strategies¶
Recursive Chunking (Default)¶
Splits text using multiple separators hierarchically:
from rapidai.rag import Chunker
chunker = Chunker(
strategy="recursive",
chunk_size=512,
chunk_overlap=50
)
Best for: General purpose, mixed content
Sentence Chunking¶
Splits by sentences with semantic boundaries:
chunker = Chunker(
strategy="sentence",
chunk_size=512,
chunk_overlap=2 # Number of sentences to overlap
)
Best for: Narrative text, articles
Best Practices¶
- Chunk Size: Start with 512 tokens, adjust based on your use case
- Overlap: Use 10-20% overlap to maintain context
- top_k: Start with 3-5 chunks, increase if needed
- Embeddings: Use sentence-transformers for speed, OpenAI for quality
- Metadata: Add rich metadata for filtering and provenance
Complete Example: Customer Support Bot¶
from rapidai import App, LLM
from rapidai.rag import rag
app = App()
llm = LLM("claude-3-haiku-20240307")
@app.route("/support", methods=["POST"])
@rag(
sources=[
"kb/troubleshooting.pdf",
"kb/faq.md",
"kb/user_guide.pdf"
],
top_k=5
)
async def support_query(question: str, rag_context):
"""Customer support with RAG."""
system_prompt = """You are a helpful customer support assistant.
Answer based on the knowledge base provided. If unsure, say so."""
prompt = f"""{system_prompt}
Knowledge Base:
{rag_context.text}
Customer Question: {question}
Answer:"""
response = await llm.complete(prompt)
return {
"answer": response,
"sources": [
{
"file": s.metadata.get("filename"),
"excerpt": s.content[:100]
}
for s in rag_context.sources
]
}
if __name__ == "__main__":
app.run()