Skip to main content
POST
https://api.mysafecache.com
/
api
/
v1
/
check
Check Cache
curl --request POST \
  --url https://api.mysafecache.com/api/v1/check
{
  "cache_hit": true,
  "answer": "<string>",
  "cache_type": "<string>",
  "lookup_time_ms": 123,
  "tokens_saved": 123,
  "created_at": "<string>",
  "similarity_score": 123,
  "metadata": {
    "model": "<string>",
    "original_tokens": 123,
    "usage_count": 123
  },
  "message": "<string>",
  "prompt_hash": "<string>",
  "suggested_action": "<string>",
  "search_results": {
    "exact_checked": true,
    "semantic_checked": true,
    "closest_match_score": 123
  }
}

Overview

The Check Cache endpoint is the first step in the MySafeCache workflow. It searches both exact and semantic caches to find if a response already exists for your messages.

Request

messages
array
required
Array of message objects following OpenAI format
similarity_threshold
number
default:"0.85"
Minimum similarity score for semantic matches (0.0 to 1.0)
cache_types
array
default:"['exact', 'semantic']"
Which cache types to check. Options: exact, semantic

Example Request

curl -X POST https://api.mysafecache.com/api/v1/check \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "What is Docker?"
      }
    ],
    "similarity_threshold": 0.85
  }'

Response

The response varies depending on whether a cache hit or miss occurs.

Cache Hit Response

cache_hit
boolean
Always true for cache hits
answer
string
The cached response content
cache_type
string
Type of cache hit: exact or semantic
lookup_time_ms
number
Time taken to find the cached response
tokens_saved
number
Estimated tokens saved by using cache
created_at
string
ISO timestamp when the response was originally cached
similarity_score
number
Similarity score for semantic matches (null for exact matches)
metadata
object
Additional metadata about the cached response

Example Cache Hit Response

{
  "cache_hit": true,
  "answer": "Docker is a containerization platform that allows you to package applications and their dependencies into lightweight, portable containers. These containers can run consistently across different environments, from development laptops to production servers.",
  "cache_type": "exact",
  "lookup_time_ms": 2.5,
  "tokens_saved": 150,
  "created_at": "2025-01-15T10:30:45Z",
  "similarity_score": null,
  "metadata": {
    "model": "gpt-4",
    "original_tokens": 150,
    "usage_count": 5
  }
}

Cache Miss Response

cache_hit
boolean
Always false for cache misses
message
string
Explanation of why no cache was found
prompt_hash
string
SHA-256 hash of the message array for debugging
lookup_time_ms
number
Time taken to search caches
suggested_action
string
Next step recommendation
search_results
object
Details about the search performed

Example Cache Miss Response

{
  "cache_hit": false,
  "message": "No cached response found",
  "prompt_hash": "a1b2c3d4e5f6...",
  "lookup_time_ms": 8.2,
  "suggested_action": "Call your LLM with these messages and use /store endpoint to cache the response",
  "search_results": {
    "exact_checked": true,
    "semantic_checked": true,
    "closest_match_score": 0.72
  }
}

Cache Types Explained

Exact Cache

  • Speed: 1-5ms response time
  • Accuracy: 100% match guarantee
  • Use Case: Identical queries
  • Method: SHA-256 hash comparison

Semantic Cache

  • Speed: 5-20ms response time
  • Accuracy: Configurable similarity threshold
  • Use Case: Similar but not identical queries
  • Method: Vector similarity search

Similarity Threshold Guidelines

ThresholdBehaviorBest For
0.95-1.0Very strict matchingHigh-precision applications
0.85-0.94Balanced (recommended)General purpose
0.75-0.84Loose matchingMaximum cache utilization
0.60-0.74Very looseExperimental/testing

Performance Characteristics

Exact Cache Performance

  • Average: 2.3ms
  • 95th percentile: 5ms
  • 99th percentile: 8ms

Semantic Cache Performance

  • Average: 12ms
  • 95th percentile: 25ms
  • 99th percentile: 40ms

Error Responses

{
  "error": {
    "code": "INVALID_MESSAGES",
    "message": "Messages array is required and must not be empty",
    "details": {
      "field": "messages",
      "expected": "non-empty array",
      "received": "undefined"
    }
  },
  "status": 400
}
{
  "error": {
    "code": "INVALID_API_KEY",
    "message": "The provided API key is invalid or expired"
  },
  "status": 401
}
{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Too many requests. Please try again later.",
    "retry_after": 60
  },
  "status": 429
}

Best Practices

Message Consistency

Keep message arrays consistent to maximize exact cache hits

Threshold Tuning

Start with default 0.85 threshold and adjust based on results

Error Handling

Always handle both cache hits and misses gracefully

Monitoring

Track hit rates to optimize your caching strategy

Complete Integration Example

Here’s how to use the check endpoint in a complete caching workflow:
import requests
import openai

def get_cached_or_fresh_response(messages, api_key):
    # 1. Check cache first
    check_response = requests.post(
        "https://api.mysafecache.com/api/v1/check",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        },
        json={"messages": messages}
    )
    
    result = check_response.json()
    
    if result["cache_hit"]:
        print(f"✅ Cache {result['cache_type']} hit! ({result['lookup_time_ms']:.1f}ms)")
        print(f"💰 Saved {result['tokens_saved']} tokens")
        return result["answer"]
    
    # 2. Cache miss - call LLM
    print(f"❌ Cache miss ({result['lookup_time_ms']:.1f}ms)")
    
    # Your LLM call here
    llm_response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=messages
    )
    
    answer = llm_response.choices[0].message.content
    
    # 3. Store for future use (see /store endpoint docs)
    # store_response(messages, answer, api_key)
    
    return answer

# Usage
messages = [{"role": "user", "content": "What is Docker?"}]
response = get_cached_or_fresh_response(messages, "your-api-key")
print(response)

Next Steps