Check Cache

Overview

The Check Cache endpoint is the first step in the MySafeCache workflow. It searches both exact and semantic caches to find if a response already exists for your messages.

Request

messages

array

required

Array of message objects following OpenAI format

Show Message Object

role

string

required

The role of the message author. One of: system, user, assistant

content

string

required

The content of the message

similarity_threshold

number

default:"0.85"

Minimum similarity score for semantic matches (0.0 to 1.0)

cache_types

array

default:"['exact', 'semantic']"

Which cache types to check. Options: exact, semantic

Example Request

curl -X POST https://api.mysafecache.com/api/v1/check \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "What is Docker?"
      }
    ],
    "similarity_threshold": 0.85
  }'

Response

The response varies depending on whether a cache hit or miss occurs.

Cache Hit Response

cache_hit

boolean

Always true for cache hits

answer

string

The cached response content

cache_type

string

Type of cache hit: exact or semantic

lookup_time_ms

number

Time taken to find the cached response

tokens_saved

number

Estimated tokens saved by using cache

created_at

string

ISO timestamp when the response was originally cached

similarity_score

number

Similarity score for semantic matches (null for exact matches)

metadata

object

Additional metadata about the cached response

Show Metadata Object

model

string

The model used to generate the original response

original_tokens

number

Token count from the original response

usage_count

number

How many times this cache entry has been used

Example Cache Hit Response

{
  "cache_hit": true,
  "answer": "Docker is a containerization platform that allows you to package applications and their dependencies into lightweight, portable containers. These containers can run consistently across different environments, from development laptops to production servers.",
  "cache_type": "exact",
  "lookup_time_ms": 2.5,
  "tokens_saved": 150,
  "created_at": "2025-01-15T10:30:45Z",
  "similarity_score": null,
  "metadata": {
    "model": "gpt-4",
    "original_tokens": 150,
    "usage_count": 5
  }
}

Cache Miss Response

cache_hit

boolean

Always false for cache misses

message

string

Explanation of why no cache was found

prompt_hash

string

SHA-256 hash of the message array for debugging

lookup_time_ms

number

Time taken to search caches

suggested_action

string

Next step recommendation

search_results

object

Details about the search performed

Show Search Results Object

exact_checked

boolean

Whether exact cache was searched

semantic_checked

boolean

Whether semantic cache was searched

closest_match_score

number

Highest similarity score found (if any)

Example Cache Miss Response

{
  "cache_hit": false,
  "message": "No cached response found",
  "prompt_hash": "a1b2c3d4e5f6...",
  "lookup_time_ms": 8.2,
  "suggested_action": "Call your LLM with these messages and use /store endpoint to cache the response",
  "search_results": {
    "exact_checked": true,
    "semantic_checked": true,
    "closest_match_score": 0.72
  }
}

Cache Types Explained

Exact Cache

Speed: 1-5ms response time
Accuracy: 100% match guarantee
Use Case: Identical queries
Method: SHA-256 hash comparison

Semantic Cache

Speed: 5-20ms response time
Accuracy: Configurable similarity threshold
Use Case: Similar but not identical queries
Method: Vector similarity search

Similarity Threshold Guidelines

Threshold	Behavior	Best For
0.95-1.0	Very strict matching	High-precision applications
0.85-0.94	Balanced (recommended)	General purpose
0.75-0.84	Loose matching	Maximum cache utilization
0.60-0.74	Very loose	Experimental/testing

Performance Characteristics

Exact Cache Performance

Average: 2.3ms
95th percentile: 5ms
99th percentile: 8ms

Semantic Cache Performance

Average: 12ms
95th percentile: 25ms
99th percentile: 40ms

Error Responses

400 - Bad Request

{
  "error": {
    "code": "INVALID_MESSAGES",
    "message": "Messages array is required and must not be empty",
    "details": {
      "field": "messages",
      "expected": "non-empty array",
      "received": "undefined"
    }
  },
  "status": 400
}

401 - Unauthorized

{
  "error": {
    "code": "INVALID_API_KEY",
    "message": "The provided API key is invalid or expired"
  },
  "status": 401
}

429 - Rate Limit Exceeded

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Too many requests. Please try again later.",
    "retry_after": 60
  },
  "status": 429
}

Best Practices

Message Consistency

Keep message arrays consistent to maximize exact cache hits

Threshold Tuning

Start with default 0.85 threshold and adjust based on results

Error Handling

Always handle both cache hits and misses gracefully

Monitoring

Track hit rates to optimize your caching strategy

Complete Integration Example

Here’s how to use the check endpoint in a complete caching workflow:

import requests
import openai

def get_cached_or_fresh_response(messages, api_key):
    # 1. Check cache first
    check_response = requests.post(
        "https://api.mysafecache.com/api/v1/check",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        },
        json={"messages": messages}
    )
    
    result = check_response.json()
    
    if result["cache_hit"]:
        print(f"✅ Cache {result['cache_type']} hit! ({result['lookup_time_ms']:.1f}ms)")
        print(f"💰 Saved {result['tokens_saved']} tokens")
        return result["answer"]
    
    # 2. Cache miss - call LLM
    print(f"❌ Cache miss ({result['lookup_time_ms']:.1f}ms)")
    
    # Your LLM call here
    llm_response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=messages
    )
    
    answer = llm_response.choices[0].message.content
    
    # 3. Store for future use (see /store endpoint docs)
    # store_response(messages, answer, api_key)
    
    return answer

# Usage
messages = [{"role": "user", "content": "What is Docker?"}]
response = get_cached_or_fresh_response(messages, "your-api-key")
print(response)

Next Steps

Store Response

Learn how to cache LLM responses

Usage Statistics

Monitor your cache performance

Caching Strategies

Optimize your caching approach

Examples

See real-world implementations

Get Started

Core Concepts

API Documentation

Examples

Overview

Request

Example Request

Response

Cache Hit Response

Example Cache Hit Response

Cache Miss Response

Example Cache Miss Response

Cache Types Explained

Exact Cache

Semantic Cache

Similarity Threshold Guidelines

Performance Characteristics

Exact Cache Performance

Semantic Cache Performance

Error Responses

Best Practices

Message Consistency

Threshold Tuning

Error Handling

Monitoring

Complete Integration Example

Next Steps

Store Response

Usage Statistics

Caching Strategies

Examples

Get Started

Core Concepts

API Documentation

Examples

​Overview

​Request

​Example Request

​Response

​Cache Hit Response

​Example Cache Hit Response

​Cache Miss Response

​Example Cache Miss Response

​Cache Types Explained

​Exact Cache

​Semantic Cache

​Similarity Threshold Guidelines

​Performance Characteristics

Exact Cache Performance

Semantic Cache Performance

​Error Responses

​Best Practices

Message Consistency

Threshold Tuning

Error Handling

Monitoring

​Complete Integration Example

​Next Steps

Store Response

Usage Statistics

Caching Strategies

Examples

Overview

Request

Example Request

Response

Cache Hit Response

Example Cache Hit Response

Cache Miss Response

Example Cache Miss Response

Cache Types Explained

Exact Cache

Semantic Cache

Similarity Threshold Guidelines

Performance Characteristics

Error Responses

Best Practices

Complete Integration Example

Next Steps