Case Study

How Wolfcast Cut Luna AI Costs by 84%

One line of code. No quality loss. No architectural changes. Just smarter routing.

84%
Cost Reduction
1
Line Changed
0
Quality Loss
5 min
Integration

Wolfcast.ai — Prediction Markets, Powered by AI

Wolfcast.ai is a prediction market platform where users place forecasts and compete on accuracy. To help users navigate complex financial data, Wolfcast built Luna — an AI assistant powered by Google Gemini that answers everything from basic questions to deep analytical queries about market movements.

Luna handles thousands of queries daily across a spectrum of complexity: simple factual questions, platform how-tos, and sophisticated multi-turn conversations involving market context, trend analysis, and prediction comparisons.

Premium Models for Every Query

Wolfcast routed all Luna queries to Google's premium Gemini models. Great for quality. Terrible for the budget. As user counts grew, AI became the largest operational expense — and most queries didn't need a premium model.

💬

Simple queries are expensive

"What is a prediction market?" doesn't need gemini-2.5-pro. But it was being routed there anyway, at $1.25/MTok input.

📈

Scale made it worse

At 200 msgs/day/user, costs grew linearly. No easy way to separate simple from complex without building a custom classifier.

🛠

Building a classifier = months

Wolfcast evaluated building their own query classifier. Estimated timeline: 2–3 months of engineering, plus ongoing maintenance.

Can't sacrifice quality

Luna's value is accurate, helpful answers. Any cost optimization had to preserve response quality for complex analytical queries.

One Line. Instant Savings.

TokenSurf sits between Wolfcast and Google Gemini. It classifies each query's complexity in real-time and routes simple questions to cheaper models — while passing complex analytical queries straight through to the premium model, untouched.

Before — Direct Gemini
const response = await fetch(
  'https://generativelanguage
  .googleapis.com/v1beta/
  models/gemini-2.5-pro
  :generateContent',
  {
    method: 'POST',
    body: JSON.stringify(query),
    headers: {
      'Authorization': apiKey
    }
  }
);
After — TokenSurf
const response = await fetch(
  'https://api.tokensurf.io
  /v1/chat/completions',
  {
    method: 'POST',
    body: JSON.stringify(query),
    headers: {
      'Authorization': tsKey
    }
  }
);

Three steps: create a TokenSurf account, store the key with defineSecret('TOKENSURF_API_KEY'), change the URL.

72–84% Cost Savings, Zero Quality Loss

Model Downgrade Cost Before Cost After Savings
gemini-2.5-pro → gemini-2.5-flash $1.25 / $10.00 $0.30 / $2.50 76%
gemini-3.1-pro → gemini-2.5-flash $2.00 / $12.00 $0.30 / $2.50 84%

Query Distribution

60% Simple
40% Complex
Routed to cheaper model (saved) Passed through unchanged (full quality)
TokenSurf's response headers (X-TokenSurf-Downgraded, X-TokenSurf-Model) let us monitor every routing decision in production. Full transparency into what's being optimized and what's being preserved.

How TokenSurf Classifies Luna's Queries

TokenSurf's classification engine runs in real-time on every request. Here's how it handles typical Luna traffic:

💬

"What is a prediction market?"

Short factual question. Matches simple patterns. Under 50 tokens.

→ gemini-2.5-flash
📊

"Analyze top movers in tech markets"

Analytical request with market context. Matches complex patterns. 10K+ token system prompt.

→ gemini-2.5-pro

Ready to cut your AI costs?

Sign up in 30 seconds. Get 1,000 free credits. No credit card required.