OpenCode
Run OpenCode against staik via the Anthropic endpoint — Swedish GPU infrastructure with prompt caching on reused context.
Configuration
Add staik as an Anthropic provider in ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"staik": {
"npm": "@ai-sdk/anthropic",
"name": "staik",
"options": {
"baseURL": "https://api.staik.se/v1",
"apiKey": "{env:STAIK_API_KEY}"
},
"models": {
"qwen3.5:35b-a3b": {
"name": "Qwen 35B (staik)",
"limit": { "context": 262144, "output": 32768 }
},
"qwen3.5:9b": {
"name": "Qwen 9B (staik)",
"limit": { "context": 262144, "output": 8192 }
},
"gemma4:31b": {
"name": "Gemma 4 31B (staik)",
"limit": { "context": 98304, "output": 32768 }
}
}
}
}
}The key is read from the STAIK_API_KEY environment variable and sent as x-api-key. baseURL points at /v1 — OpenCode appends /messages itself. No other changes needed.
export STAIK_API_KEY=sk-st-your-keyPrompt caching
Agent loops resend the same large context (system prompt, repo files, conversation history) every turn. Via the Anthropic endpoint OpenCode marks the stable part with cache_control, and staik bills it as discounted cache_read instead of full price each time.
You can see it in the response: usage reports cache_creation_input_tokens (first time the prefix is seen) and cache_read_input_tokens (subsequent hits).
"usage": {
"input_tokens": 101,
"output_tokens": 2,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 560
}Models
Set the model id in OpenCode to one of staik's models. qwen3.5:35b-a3b (default) runs on vLLM with a 262k context window — enough for large repo contexts without falling back.
qwen3.5:35b-a3b— default, 262k context, vLLMqwen3.5:9b— fast, 262k contextgemma4:31b— 96k context, vision
Notes
Thinking aliases (…-thinking) are stripped in the Anthropic translation on /v1/messages — they behave like the base model without separate reasoning output. If you need thinking, use the OpenAI-compatible endpoint for that specific call.
Want to run Claude Code or the Anthropic SDK instead? See the Claude Code docs.