OpenClaw Configuration
Configure OpenClaw with staik to use all models directly via Telegram, Discord, or the terminal.
Configuration
Add the following to your openclaw.json:
{
"models": {
"providers": {
"staik": {
"baseUrl": "https://api.staik.se/v1",
"apiKey": "sk-st-your-key",
"api": "openai-completions",
"models": [
{
"id": "gemma4:31b",
"name": "Gemma 4 31B",
"contextWindow": 262144,
"contextTokens": 64000,
"maxTokens": 8192
},
{
"id": "qwen3.6:35b-a3b",
"name": "Qwen 3.5 35B A3B",
"contextWindow": 262144,
"contextTokens": 64000,
"maxTokens": 8192
},
{
"id": "qwen3.5:9b",
"name": "Qwen 3.5 9B",
"contextWindow": 32768,
"contextTokens": 28000,
"maxTokens": 4096
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "staik/gemma4:31b",
"fallbacks": ["staik/qwen3.6:35b-a3b", "staik/qwen3.5:9b"]
},
"compaction": {
"mode": "safeguard",
"keepRecentTokens": 20000,
"reserveTokens": 24000
}
}
}
}Available models
| Model | Best for | Context window | Vision |
|---|---|---|---|
| gemma4:31b | Review, accuracy, language, vision | 262 144 | ✓ |
| qwen3.6:35b-a3b | Coding, complex tasks, vision | 262 144 | ✓ |
| qwen3.5:9b | Fast responses, simpler tasks | 32 768 | — |
Switch models with /models in OpenClaw. fallbacks enables automatic failover if the primary model is unavailable.
Fine-tune context handling
These parameters control how OpenClaw manages conversation history and model responses.
| Parameter | Where | Description |
|---|---|---|
| contextTokens | per model | Max tokens of conversation history per request. Lower if the model times out. |
| maxTokens | per model | Max tokens in the model response. Lower if responses are cut off or take too long. |
| keepRecentTokens | compaction | Number of tokens of recent messages that are never compressed. Raise for more context in short conversations. |
| reserveTokens | compaction | Tokens reserved for the model response during compaction. Raise if responses get truncated. |
| contextWindow | per model | The model's total context window (262K for all staik models). Do not change. |
Tips
- If the model stops responding mid-conversation — lower contextTokens.
- If responses get truncated — raise reserveTokens.
- If the model forgets what you just discussed — raise keepRecentTokens.
Config per plan
Every request counts both prompt tokens (conversation history) and completion tokens (model response). The key to staying within budget is contextTokens — it controls how much history is sent with each request.
Early Adopter 100 000 tokens / day
With 100K tokens per day you need to be frugal. Use the fast 9b model and keep context short.
{
"models": {
"providers": {
"staik": {
"baseUrl": "https://api.staik.se/v1",
"apiKey": "sk-st-your-key",
"api": "openai-completions",
"models": [
{
"id": "qwen3.5:9b",
"name": "Qwen 3.5 9B",
"contextWindow": 32768,
"contextTokens": 12000,
"maxTokens": 2048
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "staik/qwen3.5:9b"
},
"compaction": {
"mode": "safeguard",
"keepRecentTokens": 6000,
"reserveTokens": 8000,
"maxHistoryShare": 0.3
}
}
}
}Budget: ~12K tokens/request → ~8 conversation rounds per day
- qwen3.5:9b is 3x faster and uses fewer tokens
- contextTokens: 12K (not 64K) — 5x fewer prompt tokens per request
- maxTokens: 2048 — sufficient for most responses
- maxHistoryShare: 0.3 — compresses early to save tokens
Hobby 250 000 tokens / day
More room for longer conversations. You can use larger models and more context.
{
"models": {
"providers": {
"staik": {
"baseUrl": "https://api.staik.se/v1",
"apiKey": "sk-st-your-key",
"api": "openai-completions",
"models": [
{
"id": "qwen3.6:35b-a3b",
"name": "Qwen 3.5 35B",
"contextWindow": 262144,
"contextTokens": 24000,
"maxTokens": 4096
},
{
"id": "qwen3.5:9b",
"name": "Qwen 3.5 9B",
"contextWindow": 32768,
"contextTokens": 16000,
"maxTokens": 4096
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "staik/qwen3.5:9b",
"fallbacks": ["staik/qwen3.6:35b-a3b"]
},
"compaction": {
"mode": "safeguard",
"keepRecentTokens": 10000,
"reserveTokens": 16000,
"maxHistoryShare": 0.4
}
}
}
}Budget: ~16K tokens/request → ~15 conversation rounds per day
- 9b as primary, 35b as fallback for complex tasks
- contextTokens: 24K on 35b gives good context without blowing the budget
Agent 500 000 tokens / hour (rolling)
The Agent plan uses a rolling window instead of a daily limit — perfect for intense sessions. Full context and all models.
{
"models": {
"providers": {
"staik": {
"baseUrl": "https://api.staik.se/v1",
"apiKey": "sk-st-your-key",
"api": "openai-completions",
"models": [
{
"id": "gemma4:31b",
"name": "Gemma 4 31B",
"contextWindow": 262144,
"contextTokens": 48000,
"maxTokens": 8192
},
{
"id": "qwen3.6:35b-a3b",
"name": "Qwen 3.5 35B",
"contextWindow": 262144,
"contextTokens": 48000,
"maxTokens": 8192
},
{
"id": "qwen3.5:9b",
"name": "Qwen 3.5 9B",
"contextWindow": 32768,
"contextTokens": 28000,
"maxTokens": 4096
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "staik/gemma4:31b",
"fallbacks": ["staik/qwen3.6:35b-a3b", "staik/qwen3.5:9b"]
},
"compaction": {
"mode": "safeguard",
"keepRecentTokens": 20000,
"reserveTokens": 24000,
"maxHistoryShare": 0.6
}
}
}
}Budget: ~50K tokens/request → ~10 requests per hour (capacity is continuously reclaimed)
- Gemma 4 as primary — best for review and language
- All three models available with automatic fallback
- 48K context — sufficient for large code projects
- Rolling window: tokens from an hour ago no longer count
Quick reference
| Plan | contextTokens | maxTokens | Model | Capacity |
|---|---|---|---|---|
| Early Adopter | 12 000 | 2 048 | qwen3.5:9b | ~8 / day |
| Hobby Mini | 8 000 | 2 048 | qwen3.5:9b | ~5 / day |
| Hobby | 24 000 | 4 096 | qwen3.5:9b + 35b | ~15 / day |
| Agent Mini | 24 000 | 4 096 | qwen3.6:35b | ~4 / h |
| Agent | 48 000 | 8 192 | gemma4 + 35b + 9b | ~10 / h |
| Agent Pro | 64 000 | 8 192 | gemma4 + 35b + 9b | ~20 / h |
Sub-agents (ACP)
When your agent works on larger projects (e.g. an HTML/CSS/JS one-page with multiple files), OpenClaw normally splits the work into sub-agents via ACP (Agent Coordination Protocol). Each sub-agent gets isolated context, which dramatically lowers token usage.
Why it matters
Without ACP, the main agent has to hold the entire project state in its context between each file generation. With 5 files and 30k tokens of context per request, you burn 150k+ tokens in a single round. With ACP, each sub-agent only gets its file-specific context, which can lower token usage by 70-80%.
Configuration
Add the following to your openclaw.json:
{
"agents": {
"list": [{
"id": "<your_agent_id>",
"runtime": {
"type": "acp",
"acp": {
"agent": "codex",
"backend": "acpx",
"mode": "persistent"
}
}
}]
},
"session": {
"threadBindings": {
"enabled": true,
"spawnSubagentSessions": true
}
}
}Verify ACP is working
Run the following directly in your chat (e.g. Telegram) to spawn a sub-agent manually:
/acp spawn <agent_id>To see exactly what OpenClaw is doing in real time (which sub-agents are spawned, where errors occur), enable verbose logging directly in your chat:
/verboseCommon issues
"Cannot start sub-agent (gateway connection closed)"
This is OpenClaw's internal gateway, not the staik API gateway. Check that:
- the
acpxbackend is installed gateway.mode: "local"is set- the OpenClaw gateway is running on the port set in
gateway.port - no other processes are blocking the port (e.g. previous instances that didn't shut down)
When ACP fails, the main agent falls back to doing everything itself, which is why the entire project context gets sent in every request — and you burn tokens fast.
Troubleshooting
OpenClaw produces no response
If OpenClaw does not produce any response at all, try turning off streaming in your provider configuration. Add "streaming": "off" to your provider block:
{
"models": {
"providers": {
"staik": {
"baseUrl": "https://api.staik.se/v1",
"apiKey": "sk-st-your-key",
"api": "openai-completions",
"streaming": "off",
"models": [...]
}
}
}
}Why does this help?
Some channels and configurations can have issues with streaming responses from OpenAI-compatible APIs. Turning off streaming makes OpenClaw send a regular request and wait for the full response before displaying it — bypassing any streaming-related issues.
Agent asks "Should I run?" between steps
For large multi-step tasks, the agent may break work into batches and ask for confirmation between them. This comes from OpenClaw's own agent prompt or the model's default behavior — not the staik gateway. OpenClaw lacks a built-in flag to disable this, but you can override it with a system prompt.
Option 1: per Telegram group
{
"channels": {
"telegram": {
"groups": {
"<your_chat_id>": {
"systemPrompt": "You are an autonomous agent. Execute ALL steps of a task in sequence without asking between them. Stop only on (1) an explicit error requiring a decision, or (2) when the entire task is done. Never use phrases like \"Should I run?\" or \"Do you want me to continue?\" — just execute the next step."
}
}
}
}
}Option 2: workspace files (applies globally)
OpenClaw reads agent persona from markdown files in the workspace folder (e.g. ~/.openclaw/workspace/). Create or edit IDENTITY.md or SOUL.md with the same instruction — it then applies globally, not just per channel.
Other causes
- Sub-agent could not start: If OpenClaw reports that the gateway connection is closed, its internal sub-agent system has failed (that's OpenClaw's own gateway, not the staik API gateway). The main agent then serializes work and often breaks it into batches.
- Task too large for one response: If the task requires more output than
maxTokensallows, the agent must split it. RaisemaxTokensto 8192 if budget allows.
429 Too Many Requests mid-session
You've hit your token limit (daily for Hobby/EA, hourly rolling for Agent). Large contexts (~30k tokens/request) can hit Agent (500k/h) in 16 requests. Solutions:
- Lower
contextTokensfrom 64000 → 16000–32000 - Use
qwen3.5:9bas primary instead of 35b - Upgrade to a higher-limit plan — see pricing
Use the X-RateLimit-Used-Tokens response header to monitor usage in real time.