Can I run Continue.dev against staik?

Yes. staik is OpenAI-compatible, so add the models with provider: openai and apiBase https://api.staik.se/v1 in your config.yaml. Continue.dev then runs chat, edit, autocomplete and embeddings on Swedish GPU infrastructure.

API Reference OpenClaw OpenCodeContinue.devAgents Tool Calling Web Search Claude Code

Continue.dev

Run Continue.dev in VS Code or JetBrains against staik via the OpenAI-compatible API — chat, edit, autocomplete and embeddings on Swedish GPU infrastructure.

Configuration

staik is OpenAI-compatible, so the models are added with provider: openai in ~/.continue/config.yaml:

YAMLconfig.yaml

name: staik
version: 0.0.1
schema: v1
models:
  - name: staik qwen3.5 35b
    provider: openai
    model: qwen3.5:35b-a3b
    apiBase: https://api.staik.se/v1
    apiKey: ${{ secrets.STAIK_API_KEY }}
    defaultCompletionOptions:
      contextLength: 131072   # halva av 262k — snabbt + headroom för output
      maxTokens: 8192
    roles:
      - chat
      - edit
      - apply
  - name: staik qwen3.5 9b
    provider: openai
    model: qwen3.5:9b
    apiBase: https://api.staik.se/v1
    apiKey: ${{ secrets.STAIK_API_KEY }}
    defaultCompletionOptions:
      contextLength: 16384    # halva av 32k
      maxTokens: 4096
    roles:
      - autocomplete
  - name: staik bge-m3
    provider: openai
    model: bge-m3:latest
    apiBase: https://api.staik.se/v1
    apiKey: ${{ secrets.STAIK_API_KEY }}
    roles:
      - embed

Put your key in Continue's secrets (or replace the placeholder with the sk-st key directly). One model per role: 35b for chat/edit, 9b for autocomplete and bge-m3 for embeddings (codebase indexing).

Context window

Set contextLength to roughly half the model's window. Continue otherwise tends to fill the whole window with repo context, and a full 262k is both slower and eats more of the token budget. Half gives fast responses and leaves headroom for maxTokens on output. If you need to run really large repo contexts, just raise the number again.

Model	Full window	Half (contextLength)
qwen3.5:35b-a3b	262 144	131 072
gemma4:31b	98 304	49 152
qwen3.5:9b	262 144	16 384

config.json (legacy format)

If you run an older Continue version still using config.json, the setup is the same — chat model, autocomplete model and embeddings separately:

JSONconfig.json

{
  "models": [
    {
      "title": "staik qwen3.5 35b",
      "provider": "openai",
      "model": "qwen3.5:35b-a3b",
      "apiBase": "https://api.staik.se/v1",
      "apiKey": "sk-st-your-key",
      "contextLength": 131072,
      "completionOptions": { "maxTokens": 8192 }
    }
  ],
  "tabAutocompleteModel": {
    "title": "staik qwen3.5 9b",
    "provider": "openai",
    "model": "qwen3.5:9b",
    "apiBase": "https://api.staik.se/v1",
    "apiKey": "sk-st-your-key"
  },
  "embeddingsProvider": {
    "provider": "openai",
    "model": "bge-m3:latest",
    "apiBase": "https://api.staik.se/v1",
    "apiKey": "sk-st-your-key"
  }
}

Models

Set the model id in Continue to one of staik's models. qwen3.5:35b-a3b (default) runs on vLLM with a 262k context window — enough for large repo contexts without falling back.

qwen3.5:35b-a3b — default, 262k context, vLLM — chat & edit
qwen3.5:9b — fast, 262k context — autocomplete
gemma4:31b — 96k context, vision
bge-m3:latest — embeddings, 1024-dim — codebase indexing

Notes

Continue indexes your whole codebase with the embeddings model. Put bge-m3:latest on the embed role so indexing also stays on Swedish GPU infrastructure instead of hitting a third-party provider.

Want to run Claude Code, OpenCode or the Anthropic SDK instead? See Claude Code and OpenCode.