Caching Strategies - MySafeCache

Overview

MySafeCache employs a sophisticated dual-caching strategy that combines the speed of exact matching with the intelligence of semantic similarity. This approach ensures you get the best of both worlds: lightning-fast responses when possible and intelligent matches when exact duplicates aren’t available.

Dual Caching Architecture

Exact Caching

How It Works

Exact caching uses SHA-256 hashing to create unique identifiers for message combinations. When a request comes in, MySafeCache:

Generates a hash from the message array
Checks Redis for an exact match
Returns the cached response if found

Benefits

Ultra-fast: 1-5ms response times
Deterministic: Same input always returns same output
Resource efficient: Minimal computational overhead

Example

// These two requests will hit the exact cache
Request 1: [{"role": "user", "content": "What is Docker?"}]
Request 2: [{"role": "user", "content": "What is Docker?"}]

// This will NOT hit exact cache (different capitalization)
Request 3: [{"role": "user", "content": "what is docker?"}]

When Exact Caching Works Best

Repeated Queries

Applications that frequently ask identical questions

Template-based Prompts

Systems using consistent prompt templates

FAQ Systems

Knowledge bases with standard questions

Fixed Workflows

Automated processes with predictable inputs

Semantic Caching

How It Works

Semantic caching uses vector embeddings to find similar queries:

Converts messages to vector embeddings using OpenAI’s embedding models
Stores embeddings in Qdrant vector database
Performs similarity search using cosine similarity
Returns matches above configurable threshold (default: 0.85)

Benefits

Intelligent matching: Finds similar queries regardless of exact wording
Flexible: Handles paraphrasing and variations
Learning: Gets better as more data is cached

Example

// These queries will likely hit semantic cache
Original: "What is Docker?"
Variations that match:
- "Can you explain Docker to me?"
- "Tell me about Docker technology"
- "How does Docker work?"
- "What's Docker used for?"

Similarity Thresholds

Threshold	Use Case	Trade-off
0.95+	High precision	Fewer matches, very similar queries only
0.85-0.94	Balanced (default)	Good mix of precision and recall
0.75-0.84	High recall	More matches, but potentially less relevant

When Semantic Caching Works Best

Natural Language Queries

User-generated questions with variations

Chatbots

Conversational AI with paraphrased questions

Search Systems

Knowledge retrieval with flexible queries

Content Generation

Similar creative requests with variations

Optimization Strategies

1. Cache Warming

Pre-populate your cache with common queries:

import requests

common_queries = [
    "What is artificial intelligence?",
    "How does machine learning work?",
    "Explain deep learning",
    "What are neural networks?",
    # Add your common queries
]

def warm_cache(queries, api_key):
    for query in queries:
        # Check if already cached
        check_response = requests.post(
            "https://api.mysafecache.com/api/v1/check",
            headers={"Authorization": f"Bearer {api_key}"},
            json={"messages": [{"role": "user", "content": query}]}
        )
        
        if not check_response.json()["cache_hit"]:
            # Generate response and cache it
            llm_response = get_llm_response(query)  # Your LLM call
            
            requests.post(
                "https://api.mysafecache.com/api/v1/store",
                headers={"Authorization": f"Bearer {api_key}"},
                json={
                    "messages": [{"role": "user", "content": query}],
                    "answer": llm_response,
                    "model": "gpt-4"
                }
            )

warm_cache(common_queries, "your-api-key")

2. Prompt Standardization

Standardize prompts to increase exact cache hits:

# Different prompt variations
prompts = [
    "Summarize this text: {text}",
    "Can you summarize: {text}",
    "Please provide a summary of: {text}",
    "Give me a summary for: {text}"
]

3. Message Array Consistency

Keep message structures consistent:

# These won't hit exact cache due to different structures
messages1 = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is Docker?"}
]

messages2 = [
    {"role": "user", "content": "What is Docker?"}
]

Performance Tuning

Cache Hit Rate Optimization

Monitor and optimize your cache hit rate:

def analyze_cache_performance(api_key):
    response = requests.get(
        "https://api.mysafecache.com/api/v1/usage",
        headers={"Authorization": f"Bearer {api_key}"}
    )
    
    stats = response.json()
    
    print(f"Total requests: {stats['total_requests']}")
    print(f"Cache hit rate: {stats['hit_rate_percentage']:.1f}%")
    print(f"Exact hits: {stats['exact_hits']}")
    print(f"Semantic hits: {stats['semantic_hits']}")
    
    # Recommendations
    if stats['hit_rate_percentage'] < 30:
        print("💡 Consider standardizing prompts for better exact matching")
    elif stats['semantic_hits'] > stats['exact_hits']:
        print("💡 Semantic cache is working well, consider prompt templates")

Storage Optimization

Message Length

Shorter messages cache more efficiently. Consider breaking long prompts into reusable components.

Response Quality

Store high-quality responses that will be useful for similar queries. Poor responses reduce semantic matching effectiveness.

Cache Expiration

For time-sensitive content, implement your own expiration logic by including timestamps in queries.

Cache Strategy Selection

Choose the right strategy based on your use case:

Use Case	Recommended Strategy	Why
FAQ Bot	Focus on exact caching	Repeated identical questions
Research Assistant	Balanced approach	Mix of similar and exact queries
Content Generation	Semantic-heavy	Creative variations
API Documentation	Exact caching	Consistent technical queries
Customer Support	Balanced approach	Similar issues, different wording

Monitoring and Analytics

Track cache performance with built-in analytics:

def get_detailed_analytics(api_key):
    response = requests.get(
        "https://api.mysafecache.com/api/v1/analytics",
        headers={"Authorization": f"Bearer {api_key}"}
    )
    
    analytics = response.json()
    
    return {
        "cache_efficiency": analytics["hit_rate_percentage"],
        "average_response_time": analytics["average_lookup_time_ms"],
        "cost_savings": analytics["estimated_savings"],
        "top_queries": analytics["popular_queries"]
    }

Best Practices

Standardize Prompts

Use consistent prompt templates to maximize exact cache hits

Monitor Performance

Regularly check analytics to optimize cache strategy

Warm the Cache

Pre-populate with common queries during off-peak hours

Quality Control

Only cache high-quality responses to maintain semantic matching accuracy

Next Steps

Exact vs Semantic

Deep dive into the differences

Performance Optimization

Learn advanced performance tuning

Cost Savings

Understand the economics

Best Practices

Implementation guidelines

Get Started

Core Concepts

API Documentation

Examples

​Overview

​Dual Caching Architecture

​Exact Caching

​How It Works

​Benefits

​Example

​When Exact Caching Works Best

Repeated Queries

Template-based Prompts

FAQ Systems

Fixed Workflows

​Semantic Caching

​How It Works

​Benefits

​Example

​Similarity Thresholds

​When Semantic Caching Works Best

Natural Language Queries

Chatbots

Search Systems

Content Generation

​Optimization Strategies

​1. Cache Warming

​2. Prompt Standardization

​3. Message Array Consistency

​Performance Tuning

​Cache Hit Rate Optimization

​Storage Optimization

​Cache Strategy Selection

​Monitoring and Analytics

​Best Practices

Standardize Prompts

Monitor Performance

Warm the Cache

Quality Control

​Next Steps

Exact vs Semantic

Performance Optimization

Cost Savings

Best Practices

Overview

Dual Caching Architecture

Exact Caching

How It Works

Benefits

Example

When Exact Caching Works Best

Semantic Caching

How It Works

Benefits

Example

Similarity Thresholds

When Semantic Caching Works Best

Optimization Strategies

1. Cache Warming

2. Prompt Standardization

3. Message Array Consistency

Performance Tuning

Cache Hit Rate Optimization

Storage Optimization

Cache Strategy Selection

Monitoring and Analytics

Best Practices

Next Steps