TokenSurf API Docs
TokenSurf is an OpenAI-compatible proxy. Change one line, save 40-94% on LLM costs.
https://api.tokensurf.io/v1All endpoints follow the OpenAI API spec. Your existing code works unchanged.
Quickstart
1. Sign up — get 1,000 free credits:
curl -X POST https://tokensurf.io/api/signup \ -H "Content-Type: application/json" \ -d '{"email": "you@company.com"}'
2. Save your API key (starts with ts_, shown only once).
3. Add your provider key (e.g. your OpenAI key):
curl -X POST https://tokensurf.io/api/keys/ \ -H "Authorization: Bearer ts_your_key" \ -H "Content-Type: application/json" \ -d '{"provider": "openai", "apiKey": "sk-your-openai-key"}'
4. Use it — change your base URL:
from openai import OpenAI client = OpenAI( api_key="ts_your_key", base_url="https://api.tokensurf.io/v1") response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What is 2+2?"}] ) # Simple query → auto-routed to gpt-4o-mini (94% savings) # Check response headers: X-TokenSurf-Downgraded: true
Authentication
All requests require a Bearer token in the Authorization header:
Authorization: Bearer ts_your_api_key_here
API keys start with ts_ and are generated at signup. We store only the SHA-256 hash — if you lose your key, you'll need to create a new account.
Provider Keys
TokenSurf doesn't call LLMs directly. You bring your own API keys for each provider. Keys are encrypted with AES-256-GCM before storage.
Supported providers:
| Provider | Key format | Get one |
|---|---|---|
| OpenAI | sk-... | platform.openai.com |
| Anthropic | sk-ant-... | console.anthropic.com |
AIza... | aistudio.google.com | |
| OpenRouter | sk-or-... | openrouter.ai/keys |
Credits & Billing
- 1 credit = 1 routed request (regardless of tokens)
- Free accounts get 1,000 credits on signup
- Top up anytime: $5 = 5,000 / $25 = 25,000 / $100 = 100,000 credits
- Credits never expire
- When credits hit 0, requests return
402 - If a provider call fails, the credit is refunded automatically
Chat Completions
POST /v1/chat/completions
100% OpenAI-compatible. Supports streaming, tool calls, JSON mode.
Request body
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (e.g. gpt-4o, claude-sonnet-4.6, deepseek/deepseek-r1) |
messages | array | Yes | Array of message objects (role, content) |
stream | boolean | No | Enable SSE streaming (default: false) |
temperature | number | No | Sampling temperature |
max_tokens | integer | No | Maximum output tokens |
tools | array | No | Tool/function definitions (forces COMPLEX routing) |
response_format | object | No | JSON mode (forces COMPLEX routing) |
Example: non-streaming
curl -X POST https://api.tokensurf.io/v1/chat/completions \ -H "Authorization: Bearer ts_your_key" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "What is the capital of France?"}] }'
Response
{
"id": "chatcmpl-abc123",
"model": "gpt-4o-mini", // ← downgraded for simple query
"choices": [{
"message": { "role": "assistant", "content": "Paris." },
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 14, "completion_tokens": 2, "total_tokens": 16 }
}
Example: streaming
curl -X POST https://api.tokensurf.io/v1/chat/completions \ -H "Authorization: Bearer ts_your_key" \ -H "Content-Type: application/json" \ -d '{"model": "claude-sonnet-4.6", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'
Example: OpenRouter model
curl -X POST https://api.tokensurf.io/v1/chat/completions \ -H "Authorization: Bearer ts_your_key" \ -H "Content-Type: application/json" \ -d '{"model": "deepseek/deepseek-r1", "messages": [{"role": "user", "content": "Explain quantum computing"}]}' # Any model with provider/name format → routes via OpenRouter
List Models
GET /v1/models
Returns all supported models with pricing and routing info.
curl https://api.tokensurf.io/v1/models
Signup
POST /api/signup
Create a new account. Returns an API key (shown only once) and 1,000 free credits.
curl -X POST https://tokensurf.io/api/signup \ -H "Content-Type: application/json" \ -d '{"email": "dev@example.com"}'
Response
{
"apiKey": "ts_7d96f6aac5009f1b...",
"apiKeyPrefix": "ts_7d96f6...",
"credits": 1000,
"message": "Save your API key — it cannot be recovered."
}
Dashboard
GET /api/dashboard/
Returns your credit balance, usage stats, and savings for the current month.
curl https://tokensurf.io/api/dashboard/ \
-H "Authorization: Bearer ts_your_key"
Manage Provider Keys
GET /api/keys/ — Check which providers are configured
POST /api/keys/ — Save a provider key
DELETE /api/keys/ — Remove a provider key
Save a key
curl -X POST https://tokensurf.io/api/keys/ \ -H "Authorization: Bearer ts_your_key" \ -H "Content-Type: application/json" \ -d '{"provider": "anthropic", "apiKey": "sk-ant-your-key"}'
Valid providers: openai, anthropic, google, openrouter
Rotate API Key
POST /api/keys/ with {"action": "rotate"}
Generates a new API key. Your old key continues to work for 24 hours (grace period).
curl -X POST https://tokensurf.io/api/keys/ \ -H "Authorization: Bearer YOUR_FIREBASE_ID_TOKEN" \ -H "Content-Type: application/json" \ -d '{"action": "rotate"}' # Returns: {"status": "rotated", "apiKey": "ts_new...", "prefix": "ts_abc123..."}
Routing Config
GET /api/routingConfigApi/ — Get current routing configuration
PUT /api/routingConfigApi/ — Update routing configuration
DELETE /api/routingConfigApi/ — Reset to defaults
Configuration fields
| Field | Type | Description |
|---|---|---|
enabled | boolean | Master switch for smart routing |
aiClassifier | boolean | Use Gemini Flash for ambiguous queries |
ambiguousFallback | "conservative" | "aggressive" | How to handle ambiguous queries when AI is off |
modelOverrides | object | Per-model {enabled, customTarget} overrides |
providerEnabled | object | Enable/disable routing to each provider |
providerPriority | string[] | Provider preference order for fallback chains |
Buy Credits
POST /api/checkout
Creates a Stripe Checkout session. Redirect the user to the returned URL.
curl -X POST https://tokensurf.io/api/checkout \ -H "Authorization: Bearer ts_your_key" \ -H "Content-Type: application/json" \ -d '{"amount": 25}' # Returns: {"url": "https://checkout.stripe.com/..."}
Amount must be between $5 and $500. $1 = 1,000 credits.
Health Check
GET /api/health
Returns system status, provider health, cache hit rates, and latency percentiles. No authentication required.
curl https://tokensurf.io/api/health
# Returns: {"status":"healthy","region":"us-central1","providers":{...},"metrics":{...}}
Organizations (Teams)
Manage organizations for team-based API key management with per-key budgets and rate limits.
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/orgs/ | List your organizations |
| POST | /api/orgs/ | Create organization ({"name": "..."}) |
| GET | /api/orgs/:id | Get org details + members |
| PUT | /api/orgs/:id | Update org (owner/admin) |
| DELETE | /api/orgs/:id | Delete org (owner only) |
| POST | /api/orgs/:id/members | Add member ({"email": "...", "role": "member"}) |
| DELETE | /api/orgs/:id/members | Remove member ({"userId": "..."}) |
Roles: owner (full control), admin (manage keys + members), member (read-only).
Team API Keys
Create labeled API keys for your organization with per-key budgets, rate limits, and model restrictions.
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/org-keys/:orgId | List org's API keys |
| POST | /api/org-keys/:orgId | Create key (owner/admin) |
| DELETE | /api/org-keys/:orgId | Delete key (owner/admin) |
Create team key
curl -X POST https://tokensurf.io/api/org-keys/ORG_ID \ -H "Authorization: Bearer YOUR_FIREBASE_ID_TOKEN" \ -H "Content-Type: application/json" \ -d '{"label": "production", "monthlyBudget": 10000, "rpm": 60}' # Returns: {"apiKey": "ts_org_...", "prefix": "ts_org_abc123..."}
Team keys use the ts_org_ prefix. They consume credits from the organization's balance. Each key can have a monthly budget cap and model allowlist.
Architecture
TokenSurf is a single proxy that sits between your app and LLM providers. Every request flows through this pipeline:
Request Pipeline
// 1. Your app sends a standard OpenAI SDK request POST /v1/chat/completions Authorization: Bearer ts_your_key { "model": "gpt-4o", "messages": [...] } // 2. TokenSurf proxy handles it rate-limit → In-memory token bucket (60 req/min, 10 req/s burst per key) auth → Validate API key (Redis cache → Firestore fallback) abuse → Abuse detection (throttle/block on anomalous patterns) credits → Redis DECR (atomic, lock-free, ~1ms) with Firestore background sync classify → Rule engine (0ms) → classifier cache → AI classifier (~50ms) route → If simple + downgrade target exists: swap model circuit-brk → Check provider health → fallback to alternative provider if down forward → Pooled HTTP connection with retry (2 retries, exponential backoff) translate → Convert response to OpenAI format (if Anthropic or Google) quality → 5% of responses scored async by Gemini Flash-Lite (1-10 scale) log → Structured logging + async usage aggregation // 3. Response returned to your app in OpenAI format + X-TokenSurf-Model: gpt-4o-mini + X-TokenSurf-Downgraded: true + X-TokenSurf-Complexity: simple + X-TokenSurf-Request-Id: a1b2c3d4... + X-TokenSurf-Region: us-central1
Complexity Classification
The classifier runs in two stages. It's conservative by design — when uncertain, it keeps your original model.
| Signal | Result | What triggers it |
|---|---|---|
| Tools / function calling | Complex | Any request with tools parameter |
| Structured output | Complex | Any request with response_format |
| Code patterns | Complex | "analyze", "implement", "refactor", "debug", code blocks |
| Long conversation | Complex | 6+ messages in the conversation |
| Long message | Complex | 500+ estimated tokens in last user message |
| Factual question | Simple | "What is", "Define", "Translate", "Calculate" |
| Very short query | Simple | Under 50 tokens, 1-2 messages |
| Everything else | Ambiguous | Sent to Gemini Flash AI classifier or treated as complex |
Provider Translation
You always send and receive the OpenAI format. TokenSurf translates internally:
| Provider | Request translation | Response translation |
|---|---|---|
| OpenAI | Pass-through | Pass-through |
| Anthropic | Extract system messages, merge consecutive roles, ensure first message is user | Map end_turn → stop, reconstruct choices array |
System → systemInstruction, assistant → model role | Map STOP/MAX_TOKENS/SAFETY finish reasons | |
| OpenRouter | Pass-through (OpenAI-compatible) | Pass-through |
Security
- Your TokenSurf API key: Only the SHA-256 hash is stored. Plaintext is shown once at signup, then deleted.
- Provider API keys: Encrypted with AES-256-GCM at rest. Decrypted only in-memory when forwarding a request.
- Credits: Deducted via atomic Redis DECR with Firestore background sync. Refunded automatically on provider errors.
- Key rotation: Generate a new key with 24-hour grace period for the old key.
- Abuse detection: Automatic throttling and blocking on anomalous request patterns.
- Audit logging: All security events (key changes, config updates, purchases) logged to Cloud Logging.
Streaming
Streaming ("stream": true) is fully supported across all providers. SSE events from Anthropic and Google are translated in real-time to the OpenAI chat.completion.chunk format.
Resilience
TokenSurf is built for millions of requests per month with multiple layers of fault tolerance:
| Layer | Mechanism | Details |
|---|---|---|
| Rate Limiting | Token bucket | 60 req/min, 10 req/sec burst per API key. Returns 429 with Retry-After header. |
| Circuit Breaker | Per-provider state machine | CLOSED → OPEN (fail fast for 30s) → HALF_OPEN (probe) → CLOSED. Triggers on 5+ failures in 60s. |
| Retry | Exponential backoff | 2 retries with jitter on 429/500/502/503. Respects Retry-After headers. |
| Fallback Chains | Cross-provider equivalences | When a provider is down, routes to an equivalent model on another provider (e.g. gpt-4o → claude-sonnet-4-6). |
| Connection Pooling | undici HTTP pools | Persistent TCP/TLS connections to all providers. Saves 50-100ms per request. |
| Abuse Detection | Behavioral analysis | Throttles on high request rates (>600/hour) or error rates (>50%). Escalates to key blocking. |
Caching
Redis (Memorystore) caching eliminates Firestore from the hot path. All caching is transparent and gracefully degrades if Redis is unavailable.
| Cache | Key | TTL | Impact |
|---|---|---|---|
| Auth | apikey:{hash} | 5 min | 90%+ of Firestore auth queries eliminated (50ms → 1ms) |
| Credits | credits:{userId} | 10 min | Atomic Redis DECR replaces Firestore transactions (30ms → 1ms) |
| Classifier | classify:{hash} | 1 hour | Skips Gemini AI call for repeated ambiguous queries (~200ms saved) |
Cache is invalidated on: credit top-ups, key rotation, provider key changes, and routing config updates.
Quality Scoring
TokenSurf automatically samples 5% of non-streaming responses and scores them for quality using Gemini Flash-Lite. This helps you verify that downgraded models still meet your quality bar.
| Score | Rating | Meaning |
|---|---|---|
| 9-10 | Excellent | Comprehensive, accurate, well-structured |
| 7-8 | Good | Mostly correct with minor issues |
| 4-6 | Fair | Partially correct or vague |
| 1-3 | Poor | Incorrect, irrelevant, or harmful |
Quality scores are aggregated per model per month and visible in your dashboard. Downgraded responses are tracked separately so you can compare original vs routed model quality.
Scoring cost: ~$0.00005 per scored response (Gemini Flash-Lite). At 5% sample rate, this adds ~$0.0000025 per request.
How Routing Works
Every request goes through a two-stage classifier:
- Rule-based pre-filter (0ms) — catches obvious simple/complex queries using pattern matching:
- SIMPLE: Short factual questions, translations, calculations, definitions
- COMPLEX: Code blocks, multi-step reasoning, tool calls, JSON mode, long prompts (500+ tokens), long conversations (6+ messages)
- AI classifier — for ambiguous queries, a Gemini Flash-Lite call classifies in <3 seconds. If it times out, defaults to COMPLEX.
Routing Table
Simple queries get downgraded. Complex queries and cheap models pass through unchanged.
| Provider | Model | Simple → Routes To | Savings |
|---|---|---|---|
| OpenAI | gpt-4o | gpt-4o-mini | 94% |
gpt-4-turbo | gpt-4o-mini | 98% | |
gpt-4 | gpt-4o-mini | 99% | |
gpt-4o-mini / gpt-3.5-turbo | pass-through | ||
| Anthropic | claude-opus-4.6 / 4.5 | claude-haiku-4.5 | 80% |
claude-sonnet-4.6 / 4.5 | claude-haiku-4.5 | 67% | |
claude-opus-4.1 / 4.0 | claude-haiku-4.5 | 93% | |
claude-sonnet-4.0 | claude-haiku-4.5 | 67% | |
claude-haiku-* | pass-through | ||
gemini-3.1-pro-preview | gemini-2.5-flash | 84% | |
gemini-2.5-pro | gemini-2.5-flash | 72% | |
gemini-*-flash* | pass-through | ||
| OpenRouter | Any provider/model format | pass-through (300+ models) | |
Fallback Chains
When a provider is unavailable (circuit breaker open or persistent 5xx), TokenSurf automatically routes to an equivalent model on another provider. Fallback order follows your providerPriority setting.
| Primary Model | Anthropic Fallback | Google Fallback |
|---|---|---|
gpt-4o | claude-sonnet-4-6 | gemini-2.5-pro |
gpt-4o-mini | claude-haiku-4-5 | gemini-2.5-flash |
| Primary Model | OpenAI Fallback | Google Fallback |
|---|---|---|
claude-sonnet-4-6 | gpt-4o | gemini-2.5-pro |
claude-haiku-4-5 | gpt-4o-mini | gemini-2.5-flash |
Fallback only triggers when the provider is fully down (not for 4xx client errors). The X-TokenSurf-Fallback: true header indicates a fallback was used.
Response Headers
Every proxy response includes these headers:
| Header | Value | Description |
|---|---|---|
X-TokenSurf-Model | gpt-4o-mini | The model that actually served the request |
X-TokenSurf-Downgraded | true / false | Whether the model was downgraded |
X-TokenSurf-Complexity | simple / complex | How the query was classified |
X-TokenSurf-Request-Id | a1b2c3d4-... | Unique ID for tracing and support |
X-TokenSurf-Region | us-central1 | Which region served the request |
X-TokenSurf-Fallback | true | Present when a fallback provider was used |
OpenAI
Requests for OpenAI models are forwarded directly to api.openai.com. Format is pass-through — no translation needed.
Models: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo
Required key: openai
Anthropic
Requests are translated from OpenAI format to the Anthropic Messages API. System messages are extracted into the system parameter. Streaming events are transformed to OpenAI SSE format.
Models: claude-opus-4.6, claude-sonnet-4.6, claude-haiku-4.5, claude-opus-4.5, claude-sonnet-4.5, claude-opus-4.1, claude-sonnet-4.0, claude-opus-4.0, claude-haiku-3.5
Required key: anthropic
Google Gemini
Requests are translated to Gemini's generateContent format. System messages become systemInstruction. Roles are mapped (assistant → model).
Models: gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-3.1-flash-lite-preview, gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite
Required key: google
OpenRouter
Any model ID containing a / (e.g. deepseek/deepseek-r1) is automatically routed through OpenRouter. Format is OpenAI-compatible — no translation needed.
Popular models: meta-llama/llama-3.3-70b-instruct, meta-llama/llama-4-maverick, deepseek/deepseek-chat, deepseek/deepseek-r1, mistralai/mistral-large-latest, qwen/qwen-2.5-72b-instruct, cohere/command-r-plus
Required key: openrouter
See all 300+ models at openrouter.ai/models
Python
from openai import OpenAI client = OpenAI( api_key="ts_your_key", base_url="https://api.tokensurf.io/v1" ) # Works with any supported model response = client.chat.completions.create( model="gpt-4o", # or claude-sonnet-4.6, gemini-2.5-pro, deepseek/deepseek-r1 messages=[{"role": "user", "content": "Hello"}] ) # Check if downgraded # response.headers["X-TokenSurf-Downgraded"]
Node.js
import OpenAI from "openai"; const client = new OpenAI({ apiKey: "ts_your_key", baseURL: "https://api.tokensurf.io/v1", }); const response = await client.chat.completions.create({ model: "claude-opus-4.6", messages: [{ role: "user", content: "Hello" }], });
cURL
curl https://api.tokensurf.io/v1/chat/completions \ -H "Authorization: Bearer ts_your_key" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-2.5-pro", "messages": [{"role": "user", "content": "Hello"}] }'
Error Codes
| HTTP | Type | Meaning |
|---|---|---|
400 | invalid_request_error | Missing model/messages, unsupported model, no provider key, body too large, or model not allowed for org key |
401 | authentication_error | Missing or invalid API key (ts_ or ts_org_) |
402 | insufficient_credits | No credits remaining, or org key monthly budget exhausted |
403 | invalid_request_error | Model not in org key's allowlist |
405 | invalid_request_error | Wrong HTTP method |
409 | — | Email already registered (signup), or member already in org |
429 | rate_limit_error | Rate limit exceeded (per-key bucket) or abuse detection throttle. Check Retry-After header. |
502 | provider_error | Upstream provider failed after retries — credit is automatically refunded |
503 | provider_unavailable | Provider circuit breaker is open and no fallback configured — credit refunded |