API Documentation
staik offers an OpenAI-compatible REST API. Just change the base_url.
Authentication
Authorization: Bearer sk-st-your-key-hereBase URL
https://api.staik.se/v1Chat Completions
curl https://api.staik.se/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-st-your-key" \
-d '{
"model": "gemma4:31b",
"messages": [{"role": "user", "content": "Hello!"}]
}'Audio transcription
Transcribe audio to text via KB-Whisper large — the Swedish National Library's Swedish-trained Whisper model (50,000 hours of Swedish speech). OpenAI-compatible — just point the openai SDK at api.staik.se.
curl https://api.staik.se/v1/audio/transcriptions \
-H "Authorization: Bearer sk-st-your-key" \
-F file=@moten.mp3 \
-F model=kb-whisper-large \
-F language=svPython (OpenAI SDK drop-in):
from openai import OpenAI
client = OpenAI(api_key="sk-st-your-key", base_url="https://api.staik.se/v1")
with open("moten.mp3", "rb") as f:
result = client.audio.transcriptions.create(
model="kb-whisper-large", # eller "whisper-1" — båda routar till KB-Whisper large
file=f,
language="sv",
)
print(result.text)Token consumption
Audio transcription is billed at 100 tokens per second of audio from the same token pool as the chat models. That means 1 minute of audio costs 6,000 tokens and 1 hour costs 360,000 tokens.
| Plan | Token limit | Equivalent audio |
|---|---|---|
| Early Adopter | 100 000 / day | ~16 min / day |
| Hobby Mini | 50 000 / day | ~8 min / day |
| Hobby | 250 000 / day | ~41 min / day |
| Agent Mini | 100 000 / hour | ~16 min / hour |
| Agent | 500 000 / hour | ~83 min / hour |
| Agent Pro | 1 000 000 / hour | ~166 min / hour |
| Pay-as-you-go | purchased tokens | 100K tokens = ~16 min |
Every response includes headers X-Audio-Duration-Seconds and X-Audio-Tokens-Charged so you can track usage per request. Max file size: 25 MB (~30–45 min of audio). Supported formats: mp3, wav, m4a, ogg, flac, webm, mp4. response_format: json (default), text, or verbose_json (with segments).
Models
Choose model via the model field in your request. Each model runs on dedicated GPU hardware in Sweden.
| Model | Parameters | Context window | Vision |
|---|---|---|---|
| qwen3.6:35b-a3b | 35B MoE | 262 144 | ✓ |
| qwen3.5:9b | 9.7B | 32 768 | — |
| gemma4:31b | 31B | 262 144 | ✓ |
| bge-m3:latest | 568M | 8 192 | embedding |
| kb-whisper-large | 1.5B | — | audio (sv) |
Chat models use Q4_K_M quantization, support streaming (SSE), and have a 262K token context window (32K for qwen3.5:9b). qwen3.6:35b-a3b and gemma4:31b have built-in vision — send images via image_url in messages. bge-m3:latest is an embedding model returning 1024-dimensional vectors (configure pgvector as vector(1024)), called via POST /v1/embeddings. kb-whisper-large is KB's Swedish-trained Whisper model, called via POST /v1/audio/transcriptions.
Plans
| Plan | Price | Token limit |
|---|---|---|
| Early Adopter | Free | 100,000 / day |
| Pay-as-you-go | 100K free + buy more | — |
| Hobby Mini | 29 SEK/mo | 50,000 / day |
| Hobby | 59 SEK/mo | 250,000 / day |
| Agent Mini | 99 SEK/mo | 100,000 / hour |
| Agent | 299 SEK/mo | 500,000 / hour |
| Agent Pro | 499 SEK/mo | 1,000,000 / hour |
Agent tiers use a rolling 60-minute window instead of a daily cap — capacity is continuously reclaimed. Need more than 1M/h? See Agent Scale on the pricing page.
Error codes
| Code | Description |
|---|---|
| 401 | Invalid or missing API key |
| 429 | Daily token limit exceeded or all slots busy |
| 503 | Model temporarily unavailable |
Rate limit headers
Every response includes headers showing your current token usage.
| Header | Description |
|---|---|
| X-RateLimit-Limit-Tokens | Daily token limit |
| X-RateLimit-Used-Tokens | Tokens used today |
| X-RateLimit-Remaining-Tokens | Tokens remaining |