Continue.dev

Run Continue.dev in VS Code or JetBrains against staik via the OpenAI-compatible API — chat, edit, autocomplete and embeddings on Swedish GPU infrastructure.

Configuration

staik is OpenAI-compatible, so the models are added with provider: openai in ~/.continue/config.yaml:

YAMLconfig.yaml
name: staik
version: 0.0.1
schema: v1
models:
  - name: staik qwen3.5 35b
    provider: openai
    model: qwen3.5:35b-a3b
    apiBase: https://api.staik.se/v1
    apiKey: ${{ secrets.STAIK_API_KEY }}
    defaultCompletionOptions:
      contextLength: 131072   # halva av 262k — snabbt + headroom för output
      maxTokens: 8192
    roles:
      - chat
      - edit
      - apply
  - name: staik qwen3.5 9b
    provider: openai
    model: qwen3.5:9b
    apiBase: https://api.staik.se/v1
    apiKey: ${{ secrets.STAIK_API_KEY }}
    defaultCompletionOptions:
      contextLength: 16384    # halva av 32k
      maxTokens: 4096
    roles:
      - autocomplete
  - name: staik bge-m3
    provider: openai
    model: bge-m3:latest
    apiBase: https://api.staik.se/v1
    apiKey: ${{ secrets.STAIK_API_KEY }}
    roles:
      - embed

Put your key in Continue's secrets (or replace the placeholder with the sk-st key directly). One model per role: 35b for chat/edit, 9b for autocomplete and bge-m3 for embeddings (codebase indexing).

Context window

Set contextLength to roughly half the model's window. Continue otherwise tends to fill the whole window with repo context, and a full 262k is both slower and eats more of the token budget. Half gives fast responses and leaves headroom for maxTokens on output. If you need to run really large repo contexts, just raise the number again.

ModelFull windowHalf (contextLength)
qwen3.5:35b-a3b262 144131 072
gemma4:31b98 30449 152
qwen3.5:9b262 14416 384

config.json (legacy format)

If you run an older Continue version still using config.json, the setup is the same — chat model, autocomplete model and embeddings separately:

JSONconfig.json
{
  "models": [
    {
      "title": "staik qwen3.5 35b",
      "provider": "openai",
      "model": "qwen3.5:35b-a3b",
      "apiBase": "https://api.staik.se/v1",
      "apiKey": "sk-st-your-key",
      "contextLength": 131072,
      "completionOptions": { "maxTokens": 8192 }
    }
  ],
  "tabAutocompleteModel": {
    "title": "staik qwen3.5 9b",
    "provider": "openai",
    "model": "qwen3.5:9b",
    "apiBase": "https://api.staik.se/v1",
    "apiKey": "sk-st-your-key"
  },
  "embeddingsProvider": {
    "provider": "openai",
    "model": "bge-m3:latest",
    "apiBase": "https://api.staik.se/v1",
    "apiKey": "sk-st-your-key"
  }
}

Models

Set the model id in Continue to one of staik's models. qwen3.5:35b-a3b (default) runs on vLLM with a 262k context window — enough for large repo contexts without falling back.

  • qwen3.5:35b-a3bdefault, 262k context, vLLM — chat & edit
  • qwen3.5:9bfast, 262k context — autocomplete
  • gemma4:31b96k context, vision
  • bge-m3:latestembeddings, 1024-dim — codebase indexing

Notes

Continue indexes your whole codebase with the embeddings model. Put bge-m3:latest on the embed role so indexing also stays on Swedish GPU infrastructure instead of hitting a third-party provider.

Want to run Claude Code, OpenCode or the Anthropic SDK instead? See Claude Code and OpenCode.