Overview
MySafeCache employs a sophisticated dual-caching strategy that combines the speed of exact matching with the intelligence of semantic similarity. This approach ensures you get the best of both worlds: lightning-fast responses when possible and intelligent matches when exact duplicates aren’t available.Dual Caching Architecture
Exact Caching
How It Works
Exact caching uses SHA-256 hashing to create unique identifiers for message combinations. When a request comes in, MySafeCache:- Generates a hash from the message array
- Checks Redis for an exact match
- Returns the cached response if found
Benefits
- Ultra-fast: 1-5ms response times
- Deterministic: Same input always returns same output
- Resource efficient: Minimal computational overhead
Example
When Exact Caching Works Best
Repeated Queries
Applications that frequently ask identical questions
Template-based Prompts
Systems using consistent prompt templates
FAQ Systems
Knowledge bases with standard questions
Fixed Workflows
Automated processes with predictable inputs
Semantic Caching
How It Works
Semantic caching uses vector embeddings to find similar queries:- Converts messages to vector embeddings using OpenAI’s embedding models
- Stores embeddings in Qdrant vector database
- Performs similarity search using cosine similarity
- Returns matches above configurable threshold (default: 0.85)
Benefits
- Intelligent matching: Finds similar queries regardless of exact wording
- Flexible: Handles paraphrasing and variations
- Learning: Gets better as more data is cached
Example
Similarity Thresholds
| Threshold | Use Case | Trade-off |
|---|---|---|
| 0.95+ | High precision | Fewer matches, very similar queries only |
| 0.85-0.94 | Balanced (default) | Good mix of precision and recall |
| 0.75-0.84 | High recall | More matches, but potentially less relevant |
When Semantic Caching Works Best
Natural Language Queries
User-generated questions with variations
Chatbots
Conversational AI with paraphrased questions
Search Systems
Knowledge retrieval with flexible queries
Content Generation
Similar creative requests with variations
Optimization Strategies
1. Cache Warming
Pre-populate your cache with common queries:2. Prompt Standardization
Standardize prompts to increase exact cache hits:3. Message Array Consistency
Keep message structures consistent:Performance Tuning
Cache Hit Rate Optimization
Monitor and optimize your cache hit rate:Storage Optimization
Message Length
Message Length
Shorter messages cache more efficiently. Consider breaking long prompts into reusable components.
Response Quality
Response Quality
Store high-quality responses that will be useful for similar queries. Poor responses reduce semantic matching effectiveness.
Cache Expiration
Cache Expiration
For time-sensitive content, implement your own expiration logic by including timestamps in queries.
Cache Strategy Selection
Choose the right strategy based on your use case:| Use Case | Recommended Strategy | Why |
|---|---|---|
| FAQ Bot | Focus on exact caching | Repeated identical questions |
| Research Assistant | Balanced approach | Mix of similar and exact queries |
| Content Generation | Semantic-heavy | Creative variations |
| API Documentation | Exact caching | Consistent technical queries |
| Customer Support | Balanced approach | Similar issues, different wording |
Monitoring and Analytics
Track cache performance with built-in analytics:Best Practices
Standardize Prompts
Use consistent prompt templates to maximize exact cache hits
Monitor Performance
Regularly check analytics to optimize cache strategy
Warm the Cache
Pre-populate with common queries during off-peak hours
Quality Control
Only cache high-quality responses to maintain semantic matching accuracy