API Documentation
staik offers an OpenAI-compatible REST API. Just change the base_url.
Authentication
Authorization: Bearer sk-st-your-key-hereBase URL
https://api.staik.se/v1Chat Completions
curl https://api.staik.se/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-st-your-key" \
-d '{
"model": "gemma4:31b",
"messages": [{"role": "user", "content": "Hello!"}]
}'Audio transcription
Transcribe audio to text via KB-Whisper large — the Swedish National Library's Swedish-trained Whisper model (50,000 hours of Swedish speech). OpenAI-compatible — just point the openai SDK at api.staik.se.
curl https://api.staik.se/v1/audio/transcriptions \
-H "Authorization: Bearer sk-st-your-key" \
-F file=@moten.mp3 \
-F model=kb-whisper-large \
-F language=svPython (OpenAI SDK drop-in):
from openai import OpenAI
client = OpenAI(api_key="sk-st-your-key", base_url="https://api.staik.se/v1")
with open("moten.mp3", "rb") as f:
result = client.audio.transcriptions.create(
model="kb-whisper-large", # eller "whisper-1" — båda routar till KB-Whisper large
file=f,
language="sv",
)
print(result.text)Token consumption
Audio transcription is billed at 100 tokens per second of audio from the same token pool as the chat models. That means 1 minute of audio costs 6,000 tokens and 1 hour costs 360,000 tokens.
| Plan | Token limit | Equivalent audio |
|---|---|---|
| Early Adopter | 100 000 / day | ~16 min / day |
| Hobby Mini | 50 000 / day | ~8 min / day |
| Hobby | 250 000 / day | ~41 min / day |
| Agent Mini | 100 000 / hour | ~16 min / hour |
| Agent | 500 000 / hour | ~83 min / hour |
| Agent Pro | 1 000 000 / hour | ~166 min / hour |
| Pay-as-you-go | purchased tokens | 100K tokens = ~16 min |
Every response includes headers X-Audio-Duration-Seconds and X-Audio-Tokens-Charged so you can track usage per request. Max file size: 25 MB (~30–45 min of audio). Supported formats: mp3, wav, m4a, ogg, flac, webm, mp4. response_format: json (default), text, or verbose_json (with segments).
Models
Choose model via the model field in your request. Each model runs on dedicated GPU hardware in Sweden.
| Model | Maker | Parameters | Context window | Vision |
|---|---|---|---|---|
| qwen3.6:35b-a3b | Alibaba | 35B MoE | 262 144 | — |
| qwen3.5:9b | Alibaba | 9.7B | 262 144 | — |
| gemma4:31b | Google DeepMind | 31B | 98 304 | ✓ |
| bge-m3:latest | BAAI | 568M | 8 192 | embedding |
| kb-whisper-large | National Library of Sweden | 1.5B | — | audio (sv) |
Chat models support streaming (SSE). Context windows: gemma4:31b 96K, qwen3.6:35b-a3b 262K, qwen3.5:9b 262K. Only gemma4:31b has built-in vision, send images via image_url in messages. bge-m3:latest is an embedding model returning 1024-dimensional vectors (configure pgvector as vector(1024)), called via POST /v1/embeddings. kb-whisper-large is KB's Swedish-trained Whisper model, called via POST /v1/audio/transcriptions.
The models are open weights that we host on our own hardware in Sweden. Training data and responses are determined by the model maker, not by staik. Choose the model in the model field based on your use case. KB-Whisper is the only model we run that is trained in Sweden.
max_tokens: your prompt plus max_tokens (the room reserved for the reply) must fit within the model's context window. Exceeding it returns a 400 — shorten the prompt or lower max_tokens. This matters most with gemma4:31b (96K, smaller than the others). If you omit max_tokens we apply a sensible per-model default so replies are always bounded — set it explicitly for longer or shorter replies.Plans
| Plan | Price | Token limit |
|---|---|---|
| Early Adopter | Free | 100,000 / day |
| Pay-as-you-go | 100K free + buy more | — |
| Hobby Mini | 29 SEK/mo | 50,000 / day |
| Hobby | 59 SEK/mo | 250,000 / day |
| Agent Mini | 99 SEK/mo | 100,000 / hour |
| Agent | 149 SEK/mo | 500,000 / hour |
| Agent Pro | 219 SEK/mo | 1,000,000 / hour |
Agent tiers use a rolling 60-minute window instead of a daily cap — capacity is continuously reclaimed. Need more than 1M/h? See Agent Scale on the pricing page.
Error codes
| Code | Description |
|---|---|
| 401 | Invalid or missing API key |
| 429 | Daily token limit exceeded or all slots busy |
| 503 | Model temporarily unavailable |
Rate limit headers
Every response includes headers showing your current token usage.
| Header | Description |
|---|---|
| X-RateLimit-Limit-Tokens | Daily token limit |
| X-RateLimit-Used-Tokens | Tokens used today |
| X-RateLimit-Remaining-Tokens | Tokens remaining |