Caching¶
RapidAI provides built-in caching to reduce API calls, improve response times, and lower costs. Supports both standard caching and semantic caching for intelligent similarity matching.
Quick Start¶
from rapidai import App, LLM
from rapidai.cache import cache
app = App()
llm = LLM("claude-3-haiku-20240307")
@app.route("/chat", methods=["POST"])
@cache(ttl=3600) # Cache for 1 hour
async def chat(message: str):
response = await llm.complete(message)
return {"response": response}
Features¶
- Automatic Caching - Decorator-based caching
- Multiple Backends - In-memory or Redis
- TTL Support - Time-to-live configuration
- Semantic Caching - AI-powered similarity matching
- Cache Keys - Automatic key generation
- Manual Control - Direct cache access
Cache Decorator¶
Basic Usage¶
from rapidai.cache import cache
@app.route("/expensive", methods=["POST"])
@cache(ttl=3600)
async def expensive_operation(data: str):
# Expensive computation
result = await process(data)
return {"result": result}
How it works:
- Request arrives with parameters
- Cache key generated from function name + parameters
- Check cache for existing result
- If hit: return cached result
- If miss: execute function, cache result, return
With TTL¶
from rapidai.cache import cache
# Cache for 1 hour
@cache(ttl=3600)
async def short_lived(data: str):
return await process(data)
# Cache for 1 day
@cache(ttl=86400)
async def long_lived(data: str):
return await process(data)
# Cache forever (until manual clear)
@cache(ttl=None)
async def permanent(data: str):
return await process(data)
With Redis Backend¶
from rapidai.cache import cache, RedisCache
# Use Redis for persistence
redis_cache = RedisCache(url="redis://localhost:6379")
@cache(ttl=7200, backend=redis_cache)
async def persistent_cache(data: str):
return await process(data)
Semantic Caching¶
Semantic caching uses embeddings to find similar queries instead of exact matches.
Basic Semantic Cache¶
from rapidai.cache import cache
@app.route("/chat", methods=["POST"])
@cache(ttl=3600, semantic=True, threshold=0.85)
async def chat(message: str):
response = await llm.complete(message)
return {"response": response}
Example:
# First request
response1 = await chat(message="What is Python?")
# Cache miss, calls LLM
# Similar request
response2 = await chat(message="Can you explain Python?")
# Cache hit! Returns cached response (similarity > 0.85)
# Different request
response3 = await chat(message="What is JavaScript?")
# Cache miss, different topic
Similarity Threshold¶
Control how similar queries need to be:
# Strict matching (default)
@cache(semantic=True, threshold=0.85)
async def strict(query: str):
return await llm.complete(query)
# Loose matching
@cache(semantic=True, threshold=0.70)
async def loose(query: str):
return await llm.complete(query)
# Very strict matching
@cache(semantic=True, threshold=0.95)
async def very_strict(query: str):
return await llm.complete(query)
Threshold guide:
0.95+- Nearly identical queries0.85-0.94- Similar questions (recommended)0.70-0.84- Related topics<0.70- Too loose, may return unrelated results
Custom Embedding Model¶
from rapidai.cache import SemanticCache
# Use custom model
semantic_cache = SemanticCache(
model="all-mpnet-base-v2", # Better accuracy
threshold=0.85
)
@cache(backend=semantic_cache, ttl=3600)
async def chat(message: str):
return await llm.complete(message)
Cache Backends¶
In-Memory Cache¶
Default backend, fast but not persistent:
from rapidai.cache import InMemoryCache
cache_backend = InMemoryCache()
@cache(backend=cache_backend, ttl=3600)
async def my_function(data: str):
return await process(data)
Pros: - Very fast - No external dependencies - Simple setup
Cons: - Lost on restart - Single-server only - Limited by RAM
Redis Cache¶
Production backend with persistence:
from rapidai.cache import RedisCache
cache_backend = RedisCache(
url="redis://localhost:6379",
prefix="myapp:cache:"
)
@cache(backend=cache_backend, ttl=7200)
async def my_function(data: str):
return await process(data)
Pros: - Persistent storage - Survives restarts - Multi-server support - Production-ready
Cons: - Requires Redis - Slightly slower than in-memory
Manual Cache Control¶
Direct Cache Access¶
from rapidai.cache import get_cache
cache = get_cache()
# Set value
await cache.set("key", {"data": "value"}, ttl=3600)
# Get value
result = await cache.get("key")
# Delete value
await cache.delete("key")
# Clear all
await cache.clear()
With Custom Keys¶
from rapidai.cache import get_cache
cache = get_cache()
@app.route("/user/<user_id>", methods=["GET"])
async def get_user(user_id: str):
# Try cache first
cache_key = f"user:{user_id}"
cached = await cache.get(cache_key)
if cached:
return cached
# Fetch from database
user = await db.get_user(user_id)
# Cache for 1 hour
await cache.set(cache_key, user, ttl=3600)
return user
Cache Key Generation¶
Automatic Keys¶
The decorator generates keys from function name and arguments:
@cache(ttl=3600)
async def get_weather(city: str, units: str = "metric"):
return await fetch_weather(city, units)
# Calls with same arguments use same cache
result1 = await get_weather("NYC", "metric")
# Key: get_weather:NYC:metric
result2 = await get_weather("NYC", "metric")
# Same key, cache hit!
result3 = await get_weather("NYC", "imperial")
# Different key: get_weather:NYC:imperial
Custom Keys¶
from rapidai.cache import cache
@cache(ttl=3600, key=lambda city, units: f"weather:{city}")
async def get_weather(city: str, units: str = "metric"):
return await fetch_weather(city, units)
# Both use same cache key (units ignored)
result1 = await get_weather("NYC", "metric")
result2 = await get_weather("NYC", "imperial")
# Both hit same cache!
Complete Examples¶
Cached Chat Application¶
from rapidai import App, LLM
from rapidai.cache import cache
from rapidai.memory import ConversationMemory
app = App()
llm = LLM("claude-3-haiku-20240307")
memory = ConversationMemory()
@app.route("/chat", methods=["POST"])
@cache(ttl=3600, semantic=True, threshold=0.85)
async def chat(message: str):
"""Chat with semantic caching."""
response = await llm.complete(message)
return {"response": response, "cached": False}
@app.route("/chat/user", methods=["POST"])
async def chat_with_memory(user_id: str, message: str):
"""Chat with memory (no caching - each conversation unique)."""
memory.add_message(user_id, "user", message)
history = memory.get_history(user_id)
response = await llm.chat(history)
memory.add_message(user_id, "assistant", response)
return {"response": response}
if __name__ == "__main__":
app.run()
Cached RAG System¶
from rapidai import App, LLM
from rapidai.rag import RAG
from rapidai.cache import cache
app = App()
llm = LLM("claude-3-haiku-20240307")
rag = RAG()
@app.on_startup
async def load_docs():
await rag.add_document("docs/manual.pdf")
await rag.add_document("docs/faq.txt")
@cache(ttl=7200, semantic=True, threshold=0.85)
async def cached_retrieval(query: str):
"""Cache RAG retrievals - similar questions use cached context."""
return await rag.retrieve(query, top_k=3)
@app.route("/ask", methods=["POST"])
async def ask(question: str):
# Use cached retrieval
retrieval = await cached_retrieval(question)
# Build prompt
prompt = f"""Context:
{retrieval.text}
Question: {question}
Answer:"""
# Generate response
response = await llm.complete(prompt)
return {
"response": response,
"sources": [s["source"] for s in retrieval.sources]
}
Multi-Tier Caching¶
from rapidai import App, LLM
from rapidai.cache import cache, InMemoryCache, RedisCache
app = App()
llm = LLM("claude-3-haiku-20240307")
# Fast in-memory cache for common queries
memory_cache = InMemoryCache()
# Persistent Redis cache for all queries
redis_cache = RedisCache(url="redis://localhost:6379")
@app.route("/chat/fast", methods=["POST"])
@cache(backend=memory_cache, ttl=300) # 5 min memory cache
async def fast_chat(message: str):
"""Frequent queries cached in memory."""
return {"response": await llm.complete(message)}
@app.route("/chat/persistent", methods=["POST"])
@cache(backend=redis_cache, ttl=86400) # 1 day Redis cache
async def persistent_chat(message: str):
"""All queries cached in Redis."""
return {"response": await llm.complete(message)}
@app.route("/chat/semantic", methods=["POST"])
@cache(semantic=True, threshold=0.85, ttl=3600)
async def semantic_chat(message: str):
"""Similar queries share cache."""
return {"response": await llm.complete(message)}
Best Practices¶
1. Choose Appropriate TTL¶
# Short TTL for changing data
@cache(ttl=300) # 5 minutes
async def get_stock_price(symbol: str):
return await fetch_price(symbol)
# Long TTL for static data
@cache(ttl=86400) # 1 day
async def get_company_info(symbol: str):
return await fetch_info(symbol)
# No TTL for permanent data
@cache(ttl=None)
async def get_currency_codes():
return ["USD", "EUR", "GBP"]
2. Use Semantic Caching for LLM Calls¶
# ✅ Good - semantic caching for similar questions
@cache(semantic=True, threshold=0.85, ttl=3600)
async def chat(message: str):
return await llm.complete(message)
# ❌ Avoid - exact matching misses similar queries
@cache(ttl=3600) # Only caches identical messages
async def chat(message: str):
return await llm.complete(message)
3. Use Redis in Production¶
import os
# Development: in-memory
# Production: Redis
backend = "redis" if os.getenv("PRODUCTION") else "memory"
from rapidai.cache import get_cache
cache_backend = get_cache(backend=backend)
4. Cache Expensive Operations Only¶
# ✅ Good - cache LLM calls
@cache(ttl=3600)
async def generate_summary(text: str):
return await llm.complete(f"Summarize: {text}")
# ❌ Avoid - don't cache trivial operations
@cache(ttl=3600)
async def add_numbers(a: int, b: int):
return {"result": a + b} # Too fast to benefit
5. Monitor Cache Hit Rates¶
from rapidai.cache import get_cache
from rapidai.monitoring import get_collector
@app.route("/stats")
async def cache_stats():
cache = get_cache()
collector = get_collector()
# Track cache metrics
collector.record_metric("cache.size", len(cache._cache))
return {
"cache_size": len(cache._cache),
"metrics": collector.get_summary()
}
Troubleshooting¶
Cache Not Working¶
# Ensure decorator is applied correctly
@cache(ttl=3600) # ✅ Correct
async def my_function():
pass
async def my_function(): # ❌ Missing decorator
pass
Semantic Cache Misses¶
# Threshold may be too high
@cache(semantic=True, threshold=0.95) # Too strict
async def chat(message: str):
pass
# Try lower threshold
@cache(semantic=True, threshold=0.80) # More permissive
async def chat(message: str):
pass
Redis Connection Issues¶
from rapidai.cache import RedisCache
try:
cache = RedisCache(url="redis://localhost:6379")
except Exception as e:
print(f"Redis connection failed: {e}")
# Fall back to in-memory
from rapidai.cache import InMemoryCache
cache = InMemoryCache()
Next Steps¶
- Performance Guide - Optimize with caching
- Monitoring - Track cache performance
- Testing - Test cached endpoints