OpenCode

Run OpenCode against staik via the Anthropic endpoint — Swedish GPU infrastructure with prompt caching on reused context.

Configuration

Add staik as an Anthropic provider in ~/.config/opencode/opencode.json:

JSONopencode.json
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "staik": {
      "npm": "@ai-sdk/anthropic",
      "name": "staik",
      "options": {
        "baseURL": "https://api.staik.se/v1",
        "apiKey": "{env:STAIK_API_KEY}"
      },
      "models": {
        "qwen3.5:35b-a3b": {
          "name": "Qwen 35B (staik)",
          "limit": { "context": 262144, "output": 32768 }
        },
        "qwen3.5:9b": {
          "name": "Qwen 9B (staik)",
          "limit": { "context": 262144, "output": 8192 }
        },
        "gemma4:31b": {
          "name": "Gemma 4 31B (staik)",
          "limit": { "context": 98304, "output": 32768 }
        }
      }
    }
  }
}

The key is read from the STAIK_API_KEY environment variable and sent as x-api-key. baseURL points at /v1 — OpenCode appends /messages itself. No other changes needed.

BashSet the key
export STAIK_API_KEY=sk-st-your-key

Prompt caching

Agent loops resend the same large context (system prompt, repo files, conversation history) every turn. Via the Anthropic endpoint OpenCode marks the stable part with cache_control, and staik bills it as discounted cache_read instead of full price each time.

You can see it in the response: usage reports cache_creation_input_tokens (first time the prefix is seen) and cache_read_input_tokens (subsequent hits).

usageSecond turn — cache hit
"usage": {
  "input_tokens": 101,
  "output_tokens": 2,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 560
}

Models

Set the model id in OpenCode to one of staik's models. qwen3.5:35b-a3b (default) runs on vLLM with a 262k context window — enough for large repo contexts without falling back.

  • qwen3.5:35b-a3bdefault, 262k context, vLLM
  • qwen3.5:9bfast, 262k context
  • gemma4:31b96k context, vision

Notes

Thinking aliases (…-thinking) are stripped in the Anthropic translation on /v1/messages — they behave like the base model without separate reasoning output. If you need thinking, use the OpenAI-compatible endpoint for that specific call.

Want to run Claude Code or the Anthropic SDK instead? See the Claude Code docs.