Continue.dev
Run Continue.dev in VS Code or JetBrains against staik via the OpenAI-compatible API — chat, edit, autocomplete and embeddings on Swedish GPU infrastructure.
Configuration
staik is OpenAI-compatible, so the models are added with provider: openai in ~/.continue/config.yaml:
name: staik
version: 0.0.1
schema: v1
models:
- name: staik qwen3.5 35b
provider: openai
model: qwen3.5:35b-a3b
apiBase: https://api.staik.se/v1
apiKey: ${{ secrets.STAIK_API_KEY }}
defaultCompletionOptions:
contextLength: 131072 # halva av 262k — snabbt + headroom för output
maxTokens: 8192
roles:
- chat
- edit
- apply
- name: staik qwen3.5 9b
provider: openai
model: qwen3.5:9b
apiBase: https://api.staik.se/v1
apiKey: ${{ secrets.STAIK_API_KEY }}
defaultCompletionOptions:
contextLength: 16384 # halva av 32k
maxTokens: 4096
roles:
- autocomplete
- name: staik bge-m3
provider: openai
model: bge-m3:latest
apiBase: https://api.staik.se/v1
apiKey: ${{ secrets.STAIK_API_KEY }}
roles:
- embedPut your key in Continue's secrets (or replace the placeholder with the sk-st key directly). One model per role: 35b for chat/edit, 9b for autocomplete and bge-m3 for embeddings (codebase indexing).
Context window
Set contextLength to roughly half the model's window. Continue otherwise tends to fill the whole window with repo context, and a full 262k is both slower and eats more of the token budget. Half gives fast responses and leaves headroom for maxTokens on output. If you need to run really large repo contexts, just raise the number again.
| Model | Full window | Half (contextLength) |
|---|---|---|
| qwen3.5:35b-a3b | 262 144 | 131 072 |
| gemma4:31b | 98 304 | 49 152 |
| qwen3.5:9b | 262 144 | 16 384 |
config.json (legacy format)
If you run an older Continue version still using config.json, the setup is the same — chat model, autocomplete model and embeddings separately:
{
"models": [
{
"title": "staik qwen3.5 35b",
"provider": "openai",
"model": "qwen3.5:35b-a3b",
"apiBase": "https://api.staik.se/v1",
"apiKey": "sk-st-your-key",
"contextLength": 131072,
"completionOptions": { "maxTokens": 8192 }
}
],
"tabAutocompleteModel": {
"title": "staik qwen3.5 9b",
"provider": "openai",
"model": "qwen3.5:9b",
"apiBase": "https://api.staik.se/v1",
"apiKey": "sk-st-your-key"
},
"embeddingsProvider": {
"provider": "openai",
"model": "bge-m3:latest",
"apiBase": "https://api.staik.se/v1",
"apiKey": "sk-st-your-key"
}
}Models
Set the model id in Continue to one of staik's models. qwen3.5:35b-a3b (default) runs on vLLM with a 262k context window — enough for large repo contexts without falling back.
qwen3.5:35b-a3b— default, 262k context, vLLM — chat & editqwen3.5:9b— fast, 262k context — autocompletegemma4:31b— 96k context, visionbge-m3:latest— embeddings, 1024-dim — codebase indexing
Notes
Continue indexes your whole codebase with the embeddings model. Put bge-m3:latest on the embed role so indexing also stays on Swedish GPU infrastructure instead of hitting a third-party provider.
Want to run Claude Code, OpenCode or the Anthropic SDK instead? See Claude Code and OpenCode.